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ABSTRACT \ * 

• Survey, results have suggested that, while teachers 
like \o have test information available; most do ^ot have great skill 
or consistency in interpreting test Score data. Teachers who consider 
a certain type of test very valuable or useful are v less likely to J 
question the accuracy of the scores than are teachers who considerj a 
test to be of little value. Also, teacher judgments of test score 
accuracy are apparently tempered on the basis of other information. 
The overall intent of the project described in this report was to 
learn more about how teachers pesrceitfe test score data; how they use 
test and nontest data in decision making; factors which might be 
related to knowledge or perception of test score data; an^* whether 
knowledge or perception of te^t score data can be changed through 
brief instruction. Reported are six studies that were conducted, 
focusing on; (1) teachers 1 perceptions of test score accuracy; (2) 
teachers' interpretations of pupil performance record; (3) teachers' 
interpretations of point and interval score estimates; (4) teachers' 
estimates of costs and losses in decisions; (5) factors influencing 
teacher estimation of pupil performance; and (6) changes in teachers ' 
knowledge and perceptions of test* 'score data. A -description is given 
of the methodology used, the results, and an interpretation of 
findings for each of the tests. Appendixes include instruments used 
in the studies and tables of results. 1 (JI>) 
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■ . .'■ Overview of the Project 

■ *f 

Background ■ « 

' %*■ ■ ■ ' 

There are today two major fetors which have had, and will 
continue to have an enormous effect .upon the shape and rs-tance of edu-~ 
cation in America. These factors are accountabil ity and evaluation, 
both instructional and program. One clear outcome of these factors;-, \. 
impact is an increase in educators 1 dependence upon tests t6 provide 
empirical data for making instructional or programmatic. decisions 'Or 
changes.. It is fairly easy to see the'rapid increase in state- 
sponsored every-pupil testirVj programs as but one example. Shoemaker* 
(1978) indicated that ^he number of states endorsing and mandating some 
form of st^te r wide academic skills-assessment has* risen from thirty to 
now over forty since- the 1976-77 academic years.. Thafe the public is 
fconcerned^ with the measured capabilities of students io all or some of 
the traditional school content areas is^upderl ine'd by the considerable 
volume of* 1 i terature devoted to competency testing (c.f. Phi Delta 
Kappan , May, 1978), _ * / / . • . % .' 

* * i 

Even if the tests to be used meet stringent standards, such as . 
v those used' by "the Center for the Study of Evaluation (Hoepfner et al . , 
1972), or by any reviewer in the Mental Measurements Yearbook (Buros, 
1978), for norm- referenced tests, or those proposed by Popham (1&78) or 
«fey Hambleton and Eignor (1978) for criterion-referenced tests, there is 
no guarantee thatthe users wilVbe able to interpret (the results^of*' 
the tests sensibly. "[hat is, a. school or school, district could conduct 
a study which met .all the criteria set forth by the joint dissemination 
review jpanel publication, the JDRP Ideabook. (Tallmadge, .1977), and yet 
be ar+Cilure in terms of dissemination if the result^ released to local 
teachers and administrators .were mis-interpreted. The simple fact that 
most commercial test publishers will provide, upon request, grade- 
equivalent scores *- in spite of their many technical flaws and 
susceptibility to mis-interpretation (Hills, 1981; Tallmadge & ,Horst, 
1974) -- should alert the reader that the. state of the art -in test 
score interpretation perhaps lags too far behind the lev-el necessary 
for sound decision-making. Two recent surveys, conducted indepen- 
dently, qanvassred the local coordinators of accountabil ity or testing 
in each school district in two^ states. One question on -both surveys 
asked these district coordinators to estimate what percentage of 
teachers^in their district could. interpret a grade-eguivalentVscore ^ ^ 
properly. For Florida, the median estimate was 50%- (Hills, 1977), 
while for Mississippi, the median estimate was 40% (Morse, 1978). 
Further, when these coordinators were asked to cite instances of the 
worst mistakes in teachers' use of tests and measurements, ex-amples of 
mis-interpreta;tion of test scores were given most often in Hills' 
survey (1977) and nearly most often in Morse's survey (1978). - 



Large-sca,fe surveys of teachers have been conducted over, the years 
and, a$ a set, suggest that teachers- like*to have test information 
available but don't have great skill or -consistency in,, interpreting 
test score data "(Hastings et al., 1960; Goslin, 1967; . Rudman et al/, * 
1980; Burry et al.', 1982; Kellaghan, fcadaus & # A'irasianj ^L982 ) . ; - 

Some Results of Interpretation Studies . . 

Fleming and Anton'en' ( 1971 ) , in a study , designed to'replicate the 
•findings of Rosenthal anc( Jacobson (1968), were not able' to induce the 
type of expectaopy effects whith the "Pygmalion 11 study purported to 
obtain. , However, the study did show significant differences in the 
degree of accuracy (or validity) which public school teachers were \ 
willing to ascribe to sets of supplied test scores 4 (either HQ, ability 
percentile, or an inflated IQ for their pnjpi 1 s ) , p . based on h the .degree of 
utility or value' they attributed to the particular test type. That is,- 
lfl teachers who. considered a qertadn type of test v£ry valuable or useful ; 
'were less likely to question the accuracy of the scpres*than were ■ ■.. 
teachers who considered the particular 'test to be of little' value.. 

There is evidervce that giving test scores in the N form of 
confidence bands reduces- the amount of precision teachers will ascribe 
tfo a test. Beggs, Mayer, and Lewis (1972) gave teachers either: 
(a) an IQ scQ#e£ r ..(b) an IQ score with ah explanation of test * 
reliability andjval idi ty ; (c) an IQ score with a prediction of future - 
work;'or (d) a percentile band (confidence band) for the IQ score, with 
an explanation of test reliability .and validity. . Over a four-jnonth 
period, teachers' estimates .of the accuracy of the scopes was much more 
consistent /or. those given IQ scores (conditions a,b) -than for those 
given a confidence band (condition d). Also, when the teachers were * 
asked to estimate their pupils' actual, as opposed to measured, IQs> 
the teachers gi\ten the confiotence bands were again less consistent over 
the same period than were those 6iven/the IQ scores (conditions a,b). 
Morse (1964) gave undergraduate students hypothetical test scores p . 
expressed either as a percentile' rank, narrow percentile band ^ 
(+ .5 SO), or wide 1 percentile (+,1.0 SD).. In nearly all ca^es, the 
respondents perceived , the percentile rank as being significantly . 
further from the mean (of 50) than was its corresponding narrow 
percentile level, -which was in turn perceived as being significantly 
further from the mean than its corresponding, wide percentile 'band. 
In other words, the more realistically the accuracy of the hypothetical 
test was represented (by increasing the confidence band), the less^ 
•willing were the respondents 4>o Sug<|est that the given scores differed 
from the%mean. Thus, for genuinely low scores, the respondents Were .in 
fafft over-estimating the relative position, while for genuinely high 
scores the respondents would under-estimate the relative position. J 

Teachers apparently temper judgments of test score accuracy on the 
basis of other information. Freder'itkson and Marchie C1966) gave a 
small group of teachers hypothetical protocol data' including an IQ* 



score, aptitude score, and basic skills score- of 'a pupil's .class, 
performance.- Teachers were asked y/hether they shoirW^accept the score 
as valid, question if! validity, or make no" judgment. \ The highest - 
acceptance rates we.re'noted for high scores versus average class 
performance, .while the lowest acceptance of the test scores as being 
valid was noted'for averag'e or low LQ score ^versus- high class per- 
formance. Whije this indicates some'*questioning of test scgre accuracy 
under certain conditions* it also indicates that 'the ; tests .are being 
perceived as valid predictors of'class performance. This explains why 
teachers might feel comfortable knowing that a'student whose class 
perfprmance* is average may have a high IQ score (e.g.^an ""under- * 
achiever"), and feel less comfortable when told that e^student whose . 
class performance is high may have an average or low- IQ (e.-fl.,' implying 
that teachers do not accept the notion- of. an H overachiever M 7 . Leither- 
(1976) found when teachers were given achievement test scores of their 
students in percent"! 14 rank units, it was not uncommon for them to draw 
upon "their knowledge of unrelated student background information in 
order, to interpret the scores. This was the outcome in spite of the 
fact- that the teachers were asked merely' to interpret the pupils 1 

performance on the test. * ■ * h 

V . + 

Practical Significance of » the Problem ' 

The Title -I evaluation models, now required for use in all .ESEA 4 
Title I program evaluation, call for the use of a placement or selec- - 
tion test, an evaluation instrument, and have introduced an entirely 
new score metric, the normal curve equivalent, (NCE) . A large' number , of 
decisions about individual students, based on test scores, must surely 
be taking place almost daily^s a«result. While' widely critiqued -for 
its Tact/of experimental r,iqor since, publication, the manuscript 
Pygmali6nin the classroom (Rosenthal & Jacobson-,. 196&) raised -some 
interesting questions as* to how teachers use test score information, 
whether consciously or not. If many teachers are not able to interpret 
test scores properly, as -suggested above, and-if there' is but one grain 
of truth to the "Pygmal ion'^notion that test scores are: accorded an 
inordinate amount of weight by teachers, the need for a study of how 
teachers interpret different test score data, and whether , this capa- 
bility, may be improved' through brief training ought to be very clear. 
In all -likelihood, compensatory programs such as Title I result' in test 
scores being used b far more often in* making selection and placement 
decisions. If the dfccision-makers have not learned how to interpret 
test score informatibn properly, then many students stand to suffer. 

Project Description « « , to 

f This project was composed of six-separate substudies, each of , 
which was initiated to answer one or more specific research questions. 
Overall, a total of 474 public school 'teachers ftoui twenty-six school 
districts in Mississippi participated* in one or more phases of the 
study. The purpose and focus of each study is explained below. The : - 



overall intent w#s*to learn more about how teachers perceive test score 
data; how teachers use* test a-nd 'nontest data in decision-making; 
factors which, might be related to knowledge or perception of test sctfre 
data; and whether knowledge or perception of test score- data can be f 
changed through brief instruction. While possibly Raising more 
.questions than are answered, the sub^udies do appear to indicate 
likely areas for future research, • 

I". leathers' Perceptions of Test Score Accuracy 

* ^The> purpose of study was to .'investigate how- teachers- choose to use 
and* interpret test . information. 4 "The- study included an examination of:^ 
(9) How perceived- val id ity of test scores is affected by the congruence 
of test and nontest information; (bjf The relative perceptions of test 
score scale accuracies; ancf , (c) The relative perceptions Of the utility 
of various* 1 types of test and -nontest information ,f or making placement 
decisions. Trie- object of the study was to allow the examination of 
what types of data teachers choose to use as well as how test and 
nontest performance -information are considered a'ntf combing. 

II. Teachers* Interpretations of a Pupil Performance Record 

The purpose of this-study was to investigate how teachers^ 
interpret pupil performance record data. The studty included aTi 
examination of: (a) Which of the available types tif perfopance 
measures available teachers use in drawing initial judgments of pupil 
performance; and (b) The type of performance measure teachers believe 
to be the most ,rel iable. The object of the study was to allow the 
determination of what types of data serve to* mediate judgment of a 
student's eapabil i ty and what type of data is thought to be most 
trustworthy. N 

III. Teachers' Interpretations of Point and Interval Score Estimates 

* » 

t ' The purpose of this study was to investigate how teachers perceive 
test scores depending upon whether a point* or. interval estimate is 
provided. Specif ically,, the* study included an examination of: 
(a) Whether practicing educators interpret .point and interval estimates 
differently; (b) Whether the width of an interval estimate affects the 
resulting^perception of a score; and (c) Whether any systematic trends 
in* the types of scores could A be discerned. The object. of the study was 
to allow the determination of whether reporting point or interval 
estimates of performance would result in different perceptions or 
interpretations of the scores. 

IV. Teachers' Estimates of Costs and Losses in Decisions 

The purpose of this study was/to investigate how teachers perceive 
the costs or losses associated with incorrect decisions or outcomes,^ 
and how these. relate to the judged likelihood of such outcomes. The 



study included an examination of: (a) The freq uency with which 
standardized achievement test scopes .accuratejy^estifna^e pupil skill, 
as judged by teachers; (b) The types 1 of oy^comes or decHjons which 
teachers perceive to be less desirable;/cfnd (c) The relative r^tes of 
incorrect decisions or outcomes beingjjrade, as^judged by teachers. . 
The object of the study was to allow/fne determination of wha t .type ^. ; 
•of decision-makings system might be judged to operate in education ^ 
settings. ■ * \^ - . ■ ' • 

V. Factors Inf] uencfng Teacher Estimation of Pupil Performance 

The purpose of this s'tucly was to investigate some of the factors 
governing the dynamics of how te'acfhers interpret performance 
information in°making decisions about pupils. TJiese factors included: 
(\a) valence of information (positive*, or ; negative) i> "(b) congruence of*" 
follow-up information with ini tial v performance; (c)' rel iabi 1 i fjv- of * 
information; and (d), gander of the pjpils. Information protocV^s^Af 
hypothetical students were presented to teachers, and teachers wSre 
asked to judge, on the basis of information given, t the chances of>the 
"student" succeeding in* school. The. object of 'the' study was' t$> allow 
the examination of how much impact; if any,' the four factors may have 
in the types, of judgments teachers make concerning students, as well as 
whether selected teacher characteristic^ make any difference in the 
observed judgments. ( \ ■' * - 

* , . » ■ , . 

VI. Changes in TeachersT' Knowledge and Perceptions of test Store D.gta 

% The purpose of this study was to^ investigate, whether, and to what 
degree-, measured knowledge or perceptions oAtest score data can be 
changed as a result of short-term,^ directed training, the study. was 
designed to'aliow an examination of: (a) What changes in" knowledge or 
'perception of test score data could result from shprt-term training; 
*and (b) Whether differences could be detected which were attributable- 
to teacher characteristics of measurement background, certification or' 
•teaching level. The object of the study was to determine the efficacy 
of *a modest training intervention in mefesuped knowledge or perceptions 
teachers possess concerning test score data. 



\. Sgbstudy I • . 

Teachers' Perceptions of Test Score Accuracy 

';. ' ^ *' ' : ;. ' •A.I.V : . I -■ _\ • 
; • * How teachers choose to use arid interpret test information is -an 
aspect of educational "practice w+iicfo has pat Tteen/e)ttensively % " " 
researched. Priori research results suggest that - the dggreVto whjch 
test data are, used by teachers depends upoh how .accurate orrlJependable 
the scores or data are perceived to be,- Th\s study was / friatiated "to ^ 
investigate thre*e specific- aspects of how teachers perceive-tejs& score' 
data:- (a) How ,do perceptions of -*test data accuracy Vary* as -a function 
of the corfgruenpe of test data with bther 'performance indicators? . 
(b) Which ty^es of common score scales do/teachers -be! ieve most* a t 
accurately/' summarize; test' performance? #t (c) Of tKe v various types, of > 
test and nan t£st data which may biased* fpr making .placement decisions, 
which /do "teachers b$] ieve to be most iacdAte?< Jhese'three research 
questions 4 serve. tt> define th§ scope of thjwpreseni .study. Gjven the 
present level 1 of understanding 6? how /legisTd^ are m^de, the answers 
tQ 4 these quest^ns -could, provides Insight' as to how-ancl -what, kind of ' 
test data shoiHd be^presentejff to enhance £he likelihood of. 1 sound use. 



Sample 




Methodology 




Participants were 143 public school teachers from fourteen „«• 
different school districts in Mississippi. These participants were 
in attendance at a workshop on test development. About 82% were 
female and 18.% were male. The school districts represented were 

from the western, central and northeastern portions of the State-. 

j * ■ _ * < i * 

Instruments . 9 * ' " 

* • , Data for the first research question came from participants' 
responses to a set of items asking the reader to judge tl>e validity 
of a" .given test score, in light. qf other known, nontest information. 
Each respondent .was* presented eight v such, items, (there w^re sixteen 
different items in a 1 1 0 • The items presented yariaus combinations 
of test score and nontest score^data . Nontest score- data were such - 
data as marks in a given cpu^se. In each item, respbndents were 
asked to judge the test score as yalid, questionable or invalid. 
'Items were classified T as congruent if both "the test and nontest .data 
were high or low. However* incongruent combinations (e # .g. , high. test, 
score presented with low norrtes.t score) were. also included. 

An example of a congruent jhigfy test; hfgh nontest score) item is; 
A female student, eighth grade, has average grade of 'A. Her . 



.•' .»'■'. .... 

/ new CAT-77 r^dvng comprehension percentile rank is 91. This 
'/ score is:< T \ 

/. a. Valid' y \ v , ' * . 

// ' b. Questional e * „ 

c. InvaTid ; / . • > 

Ar>> example of an incongruent (low test/score; high nontest scare) 

; item is: * ■ ' . . S * 

A female student, tvTelfth 'grade, has a semester average of'93 .in 
Senior. , Engl ish. Her^new CAT-77 language arts percentile rank is 

30. This score is: / 

a. Valid B v ■ 

b. Questionable - 

c. Invalid ■ v * 

^Internal consistency reliability for this measure, estimated by 
coefficient alpha, was .80. A copy of the full set of items, is pre-, 
sented in Appendix A. . 

Data for the'sscorid and third research questions came from 
separate pair-comparison questionnaires. The first presented five 
types of" test score scales, including: Raw score (number right); 
Percentile rank; Grade-equivalent score; CAT-77 ADSS ja proprietary 
scale soore);'*and Stanines. These five score types represent perhaps 
the most widely /used -- exclusive of the^CE scores -- score scales 
in Mississippi.' The second questionnaire^presented seven sourcas of * 
information which could possibly be usjed in making "pupil placement 
decisionsr~<These included: Prior course grades or marksj 
Standardized )f€j3je\tement test scores*; .Prior teacher's written 
recommendation; 
written recommencf; 
test scores; and P 
ments. A copy of 
C, respectively. 




i dual I. Q. test; Prior school counselor's 

cal criterion-referenced (CR) achievement, 
ption of child's school accomplish- 
es is included in Appendices B and 



The method of^^PFf^^g^^fisons requires that all possible pairs 
of stimuli be preSent^^rra forced-choice format; the respondent' 
.must'select Me as preferable to the other. v This method* permits, 
if th<j necessary assumptions hold, interval -seal ing Of the relative 
positrons of the stimuli (Guilford, .1954). The order, sequence and 
pairing of*stimuli^were generated by use of a random number table, 
the intent being to ^avoid pos-sible position bias. 



For each questionnaire, respondents were told that there were no 
"right" or "wrong" ansy/ess anfl that they should respond on the basis 
oT'their own beliefs. 

Specific instructions for the test score type questionnaire were: 

% Each year, the state sponsors testing of Students in qrades 4-, 6 
and 8 in basic skills on the Cali/ornia Achievement test. VaridtiS 9 




" types of scores are provided for .s£uden;t& who take the test. /ROr 
each of/the fol lowing j terns , 'please select/the type of score you 
believe' would best help .you, 7 as an educator,' to '.make sound 
: ' decisions abou| what a student had, or' had naf.-f earned. : 

PJease circle the letter of the type of score you select 'for 
each item. ., ; ' t * * , ; ? • 

Remember, you (Should choose the type of score YOU t^iKk-wquld 
best' help in making- sound 'decisions Vbout 'a student's skills. . 

- Specific /nstrfetions for. the types of data questionnaire were;- 
V . v. ^ • • , - . . - 

v 9hen a n€w student* comes to your school, some type of placement 
v decision must be ( made. For each of the following questions,*' 
please circle the Tette^of the type of information you bel<ieve 
is likely to ha MCTST ACCURATE for making sound placement 
decisions. 1 * f 

All repptinitentt were abl&'to complete' the longestqgestionnair^ 
easily within fifteen minutes. Only sne questionnaire °was%dmini ster. 
ed a day. ' - ' . 

; Results' 




Question 1 : How do perceptions of- test data accuracy vary' as a 
function of the congruence of the test data with other performance 
indicators? . r *\ * - 

» .. . 

Summary statistics by possible congruence category are present^ 
in Table 1-1. Higher scores represent greater, perceived validity for 
the category of interests Scoring was on a > simple threfc.-point scale, 
"valid" v^as^a^signed three paints, a. "questionable" 'rating was given 
two, ahd^^trrVal id" was scored^as one point. From the results in 

' Table I-V, the reader may deduce that test information whicb was . 
congruent (e.g., low-low ^r high-high) was perceived as mpre valid 
than was the test score information -which was incongruent. There ^ ; 
was a sizable advantage- in ratings for congruent aYid h'igh score data 

;><over those for congruent and low score data. For -the Incongruent 
data,- there was a si ightly greater tendency, for the respondents to 
consider high test-low nontest matches as more believable than low 
test and high nontest combinations. The magni tudes-of these differ- 
ences are presented in Table 1-2. The effect sizes' shown in Table 2 
r^nge from small (.27) to very large (1.53)* The overall -hypothesis 
of equal ratings among the congruence categories was rejected at . 
traditional alpha levels (F=119.51 ;"df=3/i097; p<.001). <y 

Question 2 : Which. types of .common score scales do teachers^. bel ieve . 
mosx accurately summarize test performance? 



V 



TABLE 1-1 

. Summary of Congruence Category Means 



Test' Data 



Low Score 



High Score 



Nontest Data 



Low Score 



2.27 
(0.74) 

1.98 
(0.59) 



High Score 



V 



■ 1 ."81 
(0,67) 

.2.73 
(0.52) 



NOTE: Figures" in parentheses" are standard deviations; all values based 
on 143 cases. 





Or 1 

TABLE 1-2 

• Summary of ; Effect Size Estimates for Congryjence Categories 





Category 




Category^ 




Low-High 


High-Low 


High-High 


Low- Low 


^ .65 




•72 


Low-High 




.27 


1.53 


High-Low 






1.35 , 



T 



Effect size defined as (^i"^! Spooled' values based on 143 cases. 
Categories represent specific test-npntest -score combinations. 
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Table 1-3 irrcludes, the results of the pair comparison judgments. 
Overall, grade-equival-ent scores were judged to be most accurate by 
the teachers, followed by percentile ranks. Further behind* and. 
nearly equal in ranking, were raw scores aild proprietary scale scores. 
Bringing up the distant rear was stanines. The scale "values may be 
interpreted in a relative sense; that is, grade equivalent scopes 
were preferred about twice* as much as raw score and scale scores, 
and were about six times more popular than stanines. Shifting to a* - 
different scale (T-scores) removes the "anchor, " but still permits 
relative contrasts. It is interesting- to note that one of the least 
sound score scales is considered by teachers as most useful for making 
decisions about pupils. On the other hand, the stanine, which was 
designed to reflect the inherent uncertainty in a point estimate of 
„an examinee's score, is least preferred. 

Question 3 : Which types of data, do teachers believe to be most 
accurate for making placement decisions? 

t 

The pair comparison judgment results are summarized in Table 1-4. 
Overall, *test scores based on. an achievement measure (both 
."standardized" and "local CR") are given highest ratings. After, 
these comes student grades, then written recommendations by teacher 
and school counselor, respectively.. Individual IQ test results 
ranked below the prev.ious five. Finally, considerably below IQ 
scores was the parents' recommendation. Perhaps teachers have 
had much experience with parents' judgments of their child's 
capacity, and have found it wanting. 

* 

That performance-based measures should be accorded high ranks 
seems reasonable, given that pripr performance --such as grade 
point average is typically the best single predictor of future 
performance. What is intriguing is the fact that IQ tests, though 
a sppciaisized performance measure, are possibly perceived as not 
sufficiently relevant to use in placement decisions, if other 
alternatives exist. ^ m ' 

Summary 

Test dataNare apparently more readily accepted if: (a) congruent 
with known nontest data; or (b) hi gh : rather than low if incongruent. 
That is, the so-called "under-achiever" (one who performs below the' 
level at which a test might indicate is possible) is perhaps slightly 
more adceptable than is the notion of an "over-achiever." If given 
their choice, the participants in this study would much : rather#tfiave 
grade-equivalent scores provided for their use than most others — 
this in spite of the fact that ^possibly few people could give an 
accurate paraphrase of how one may interpret a grade-equivalent 
scores^ Finally, performance da ta t are perceived as preferable to 
nonpierformance data for making sound placement decisions. 

• . if 
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• *: TABLE 1-3 
Summary of Scale Values for Test Score Types 



Test* Score Type 



Scale Value 



T-score 



Grade-equivalent Score ; ; 


6. 


15 


'• 61 


Percentile Rank 


5. 


41 




Raw Score (number ri.ght)' 


• 3. 


84 


t 

49 


Scale Score (CAT-ADSS) 

' 4 


3. 


41 


47 


Stan.ine 


i. 


00 


35 



■/ 



Mean 
S.D. 



3.96 
2.00 



50 
10 



NOTE: All values based on 143 cases. 
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TABLE 1-4 , 
Summary of Scale Values for Data Sources 



Data Source 




Scale Value 




T-score 


Standardized achievement 
test scores 




6.64 




V . 

57 


„ * -« 
Local CR achievement 
. scores 




6.59 


> 


1 


Grades or marks ^ 




6.43 




56 


Teacher 1 s wri tten 
recommendation 




b.ob 






Counselor' s written 
recommendation 




. . 5.45 




51 


Individual IQ test 




4.61 




47 . 


Parents' description 

of. child ' s /accompl i shments 




J 

1.00 • 




! 29 



I > Mean v 5.22 50 - 



S.D. «>. 2.00 ' 10 



NOTE: All values based on 143 cases. 
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These results suggest that teachers are not only willing tQ make 
use of test data, but might actually prefer it to other data types. 
However, ff some of the outcomes observed in this study carry over'to 
the classroom, the reader may well wonder % whether test data are 
being f used in a sound, fashion. The challenge to .both researchers 
and publishers should' be c>ear: To .develop a useful and sound means 
for teachers to move from pupil results to 'considered decisions which 
will best facilitate each child's educational success. 

■ ( > . 



( 




4 



S' : - 



14 



Substudy II 

♦ * • ' * . * ■* 

Teachers 1 Interpretations of a Pupil Performance Record 

The results of substudy I suggest that certain score scale types 
are preferred by teachers to others, as ere certain sources of pupil 
performance information. Also* the perceived validity of t5st scores 
will vary as a function of the [.congruence of test and pontest score 
data, as was found in another study by Frederickson and Marchie (1966). 
Farr and Griffin (1973) and more, recently, Newman and Stal 1 ings (1982) 
suggest that teachers' awareness of sound measurement practice may , 
have implications for their classroom assessment and decision-making 
behavior. This study was initiated to answer two specific aspects 
of how teachers interpret pupil performance record data: (a) What 
type(s) of performance indicator do teachers use in drawing* initial 
judgments, of pupil performance? (b) What type of performance indi- 
cator (do teachers believe to be the most trustworthy? These two re- 
search questions serve to define the scope of the present study. ^The 
results of this brief, exploratory study were used to shape the work 
described in substudies V and VI. ' 

• ' A ' ■ • • 

« Methodology 

Sample 

Participants wereZIO public school teachers fronf sixteen different 
school districts in Mississippi^ 'These participants were in attendance 
at a workshop on Uii^rpretatioivof test sco*e data, which was. a part of 
a week-long workshop on test development. About 76% were female and 
24% were malfc. The school .districts represented were from the western, 
central, and southern. portions of th^e state. 

. Instruments 

* • 

x Dafta for both questions eame from participants 1 responses to^two 
items which followed a hypothetical pupil performance record. There 
*were two* versions of the protocol used, varying primarily in terms 
of what IQ score was affixed to the record. Other performance data - 
included semester grade averages and standardized achievement test 
scores, stressed in percentiles and scale sco'res. (NCE scores were 
also included on the first record.) The hypothetical pupil records 
are include in Appendix D. 

For each record, r^pondents were told that there were no, "right" 
or^'wrong" answers and mat they should respond, on the basis of their 
own beliefs. - 

Specific instructions were: , 

The following information has come from an anonymous student's/ 

. • < 15 



cumulative record. Please examine it carefully and answer .the 
questions which follow. • . 7 

All respondents were able to complete. the task easily within ten 
minutes.v f 



Results 

Question I : What type(s) of performahce indicator dp teachers use 
in drawiTiyfci initial judgments of pupil performance? 

'\ * 
Answers to item 1 were coded so that an answer of "Well above her 
ability 11 was coded as a 3, "About equal to. her ability" as a,2 and 
"Well below her ability" Was coded as* a 1. 

Differences on the first item responses between protocol groups 
are summarized in Table 1 1 - 1 . The effect size of the difference in 
ratings was 0.75 standard deviations (based on pooled variance 
estimate), which was statistically significant at the .05 level 
(F, pop = .35.31). Because the pupil performance protocols differed 1 
primarily on the stated IQ score, a reasonable conclusion -is th^t one 
*of the least favored score types, IQ, V is- given most weight in judging 
performance relative to "ability. 11 A second possible interpretation is 
that the respondents paid close attention to the directions and con- 
cluded, correctly, that IQ data was the measure most indicative of 
ability. However, in'most tests and' measurements courses, the concept 
of errors of measurement is presented; sd-caTled "normal" ranges for IQ 
are generally described as between 90-110. The result suggest that 
these two pupil records are not at- all perceived as equivalent in 
abil i ty. • *' 

TABLE II-l V 

. Summary Statistics for Protocol Groups on 
< Performance Judgment 



Group 1 (Low IQ Protocol ) Group 2 (High IQ Protocol) 
Mean * 2.23 ' 1.82 

Standard . 

deviation . 0.42 0.48 

n . 66 144 
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Question* 2 : What /type of performance indicators do teachers believe 
to be the most.. trustworthy? 

A summary of * the options presented in item 2 suggests that 
teachers believe , and very likely are correct, that grade point 
average is the most reliable of the performance indicators listed 
on the protocol [50%). The next most popular choice was that of 
achievement test .scores (31%). The IQ; scores were- about even in 
their rate of selection, e^bout ten percent each (19% total). A 
chi-square test of independence for performance score choice and 
assigned protocol yielded no significant relationship (chi-square 
corrected = 1.82; 3 df; probability = .615"). Thus, regardless of 
this hypothetical student being considered a relative "underachiever" 
or "overachiever," teachers did no^t vary their perception of grade 
average as the most reliable of the given performance data. 

V 

> ^ Finally, similar contrasts for questions 1 and 2 were conducted 
for teacher gerlder, test course status, (yes, or no: Have you ever 
taken a course in tests^and measurements?) , certification (A, which 
is. a B..S. level; AA, which is the M.S. or M . Ed . level; and AAA, which 
represents the Ed. S. level of training) or level of teaching \/ y 
(elementary, secondary or ffoth). In all cases, no statistically.' 
si gnificant, differences were detected. 



Summary 

A considerable number of the participants judged the hypothetical 
student as an "overachiever" or an "underachiever," ■ depending upon 
which of two performance protocols was assigned. These judgments 
would appear to be based primarily upon the listed IQ score in relation 
to the other performance data. Yet, IQ was listed as the least 
reliable of the types of information available and, from substudy I, 
is one of the less-preferred data sources. Given that teachers can 
form opinions of pupi 1 • performance, the questions whio^ remain to . 
be answered are: .(a)'Do these pre-concei ved judgments of a. pupil 
transfer to decisions made about the. pupil? (b) Would these judgments 
be made on the job, or only in contrived ta^ks such as the one used 
in'-the" present study? (c) How often do teachers believe their judgment, 
however "formed, might be incorrect? Finally, (d) How long would a 
teacher have to observe a pupil in order to. alter an initial judgment v 
"of the child if that judgment was incorrect? * 

These questions , if investigated, could serve to support the 
formation of what might be considered. essential training in sound 
placement and decision-making principles for educators. 
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Substudy III 

Teachers' Interpretations of Point and Interval ScVe Estimates 

Among the possible. concl usions -of substudy II wa^ the idea that 
teachers may not understand or may ignore the concept that test data 
are subject to "wobble" and that scores which are different may not be " 
significantly different. Most "measurement texts (e.g*, Hills, 1981; 
Anastasi, 1976) suggest th'at confidence bands represent more realis- 
tically the degree of precision with which a test can estimate" an 
examinee's true still level. As a test's reliability increases - 9 
resulting confidence -bands for any given confidence level, wi 1.1 decrease 
in their width. Thus, wide confidence bands should be J< tip-off to 
relatively low test reliability; armed with this information, users'* 
should^ 6e wary of placing considerable stock in wide confidence bands. 

Morse (1964) investigated the differences in how undergraduate 
students, early in a course on tests and measurements , interpreted 
point and* interval estimated relative to the mean score. His findings 
suggest that interval estimates (confidence bancjs) were more likely to 
be judged .as closer to the mean than were point estimates (individual 
percentile ranks). Further, the phenomenon was more pronounced for 
"wide" confidence bands (+ one standard deviation) than for "narrow" 
(+ one-half standard deviation). 

The present study was initiated to answer three specific 
questions: (a) Do practicing educators interpret point and interval 
.score estimates differently? (b) Does the width of 1 an interval^ 
estimate result 'in different perceptions? (c) Are there identifiable 
trends in the interpretations of these scores? The answers to these 
questions would have' implications for both reporting practice and 
possibly for pre-service or in-service training needs of educators. 



Methodol ogy ' 

Sample . N v 

Participants were 105 public school teachers from Mississippi, 
representing eleven different school districts. Of these, Approxi- 
mately 78% were female and 22% were male. The participants were 
attending a workshop on' interpretation of test scores, whigh was^part 
of a larger workshop on test development. The sohool districts 
represented were from the western, central and northeaStedi regions ©f . 
the state. . ♦ / 
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Instruments ' 

A single instrument was used to. gather the data for questions 1-3. 
This instrument consisted of four sets of njne scores* either percen- m 
tile ranks or percentile bands, one set to. a page.* To the-sfde 'of.each 
score was a rating scale which ranged from \ to 5 for which the 
following key was given: 

5 = Score is well alcove mean. 
4 = Score is somewhat above mean. 
^ 3 = Score is equal or nearly equal to mean. f * 
2 = Score is somewhat belpw mean. 
1 = Sco.re is well below mean. 

s 

Overall directions for the task were as follows: 

Directions * 

On the follow sheets, you will find a number of test 
scores, expressed as percentile ranks or percentile barfds . 

Your percentile rank tells the percenta-ge of a norm ■ 
group that you have equaled or surpassed. For example, if 
your percentile rank for height in this class is 75, then. you 
are as tall or tajler than 75% of the persons- irt 'the class. 

Because test scores ' tend to ^yary somewhat ^due to such 
' chance faciprs as a jlucfcy guess or the choice Vf. questions, ^ 
we sometimes express a score as a percentile band . T£e per- \ ' 
centile bancf 50-75,' for example, would mean that we are rea- . \ 
sonably confident that the person earning this score is 
really better than the lower hajf of the group, but not as 
good as the top quarter of th^group. >' . 

When the signal is given, open your booklet towage 1, 
* and^egin to work. Be sure that you finish each page before . 
going on to the next page. DO NOT TURN BACK TO A PAGE ON CE 
YOU HAVE LEFT ^IT . WAIT FOR THE SIGNAL TO START .' 

The scopes selected represented values of -2.0, -1.5,-1.0, -0.5, 
0.0, +O.5,*+1.0, +1.5 1 , and +2.0 standard deviations from the mean, 
expressed as percentile ranks or as 67% confidence bands for various' % 
reliabilities. The wide confidence band, defined as +1,0 standard 
deviations, assumes a test with zero reliability (e.g., standard error 
of measurement = staridard, deviation) . The narrow confidence band,', 
•defined as +0.5 standard deviations, assumes a test with reliability : 
of .75. The very narrow band, defined as +0.33 standard deviations, 
assumes a test^rel iabil i ty of *89. If one assumes that teachers make 1 - 
decisions f^om standardized achievement tests, .then, the very narrow 
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band might be a realistic interval estimate. .For locally constructed 
tests, the narrow or a "'mid-wide" band might be more realistic. 

The order 'of the scores was randomly determined and kept constant 
for each set. The sequence of the sets was randomized for aH ' , 
participants so as to avoid any order effects from biasing the results. 

\ ■ - \ 

The rati-hgs were- then summed selectively for ea.ch.set. Ratings on 
the four scores or bands based on the scores k above the mean; were summed 
for an "above. the mean" -total for each set. Similarly, ratings on the 
four scores or bands below the mean were summed for a "below the mean" 
♦total for each set. -The rating scale being from one to five, each . 
summed value had a potential range of from five to, twenty. High values 
would suggest a perception of the score^being above, the mean,' while 
low values would^indieate a perception of the scores' being below the 
mean. - • 

. "\ 

Internal consistency reliability estimates for the instrument 
were: (a) for the individual scores,, alpha"- .85 (k 5 36); (b) for the 
"below the mean" sums, alphas .66 (k = 4); and (c) for the "above the 
mean" sums, alpha = .78 (k = 4). A copy of the complete instrument is 
'presented in. Appendix E. 

All participants were able to\complete the instrument easily* \ 
within thirty mi nut?£>^ . ' 

• * v • Results * * 

Question 1 : Do practicing educators interpret point and interval score 
estimates^differently? . < x - 

H repeated-measures analysis of variance (ANOV/f) was calculated 
for the four'below the jjean sums (one from 1 the percentile' rank set, one 
from the 1/3 S.D. bancf/3et, one from the 1/2 S.D. ba.nd set and one from 
the 1 S.D. band set). The results are presented in Table tll-l. A 
statistically significant between-sets F-ratio'was obtained, which 
suggests that, for the below jthe mean scores, there was a difference in 
how near to or £ar f^rom the mean point ancj interval estimates were per- 
ceived to be; A similar analysis was calculated for the four above the 
mean scores, ^and it also resulted in a statistically significant 
bettoeen-aets'F-ratio. The summary of that ANOVA contrast is presented 
in Table 1 1 1-2. - r 

Summary statistics for the summed scores are presented fn 
Table III-3. There is a systematic change within each score type. The 
below the mean score sums tend ^to increase as the interval estimate 
becomes wider. (A value of 12 would represent a rating of the scores 
as being equal to the mean.) The opposite is true for the above the 
mfeanscores. As the interval estimate becomes wider, the summed 
ratings decl ined. - 
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TABLE* II 1-1 - 
Repeated Measures of ANOVA Contrasts of 





Sets of Percentile 


Ranks 


belov&50 






<- 














i ' = 










» 'W 

Source of Variation 


Sum of Squares 


df 


Mean Square, 


F 


Probability 


Between Persons t 


248.89 


104 


2.39 




It 


Within Persons 


351.^25 


- 315 


1.12* 




« 


Between Sets • 


95.00 


3 


31 .67 


38. ^5 


.000 


Residual 


256.25 


312 


0.82 




4 


Total. 


* 600.12 


419 
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TABLE 1 1 1-2 






* 


Repeated Measures of ANOVA Contrasts of 
Sets of Percentile Ranks above 50 ~ 






Source of Variation 


. SuTh of Squares 


df 


Mean Square 


F 


Probabil ity 


Between Persons 


478:01 


104 


4.60 






Within" Persons 


P 471.50 


315 


1.49 






Between Sets 
Residual ^ 


161.57 " 


3 


53.86 


54.22 
• 


.000 


309.93 


312 


0.99 






" y K 

Total " ^* 


949.51 


419 










• 
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TABLE 

Perceived Distances of Scores 
from 50th Percentile ■ 



ScoPe Set 



Sc'ores 



Values below 50 



Percentile 1/3 Standard 1/2 Standard * 1 Standard 
Rank Deviation Band Deviation 13and -Deviation Barvd 



5.£7 
(0.78T 



Values above 50 17.71 
(1.13) 




'17.36 " 
(1.24) 



5.82 
(1.08) 



T7.53 
(1.21) 



6.56 
(1.45) 



16.™ 
{1.810 



Note: Values in parentheses are standard deviations; all values based on 105 
..respondents. • 



Question, 2 : Does the width of an interval estimate result in different 
perceptions? 

t .* . • .. 

Orthogonal contrasts were calculated for each score type and 
indicate, for the below the mean sums that: far) the interval estimate 
ratings were perceived as significantly closer to the mean than the 
point estimate (F = 50.69; p < .001); (b) there was no difference 
between the very narrow and narrow interval 'estimates (F = 3.07; 
p = .078); and (c) the narrow and very narrow interval estimates were 
perceived as further from the mean than was the wide interval estimate 
(F * 61.95; p < .001). . 

Very similar conclusions could be drawn for the above the mean 
score set contrasts: (a) the point estimate was . perceived as being 
significantly further from the mean than were the interval estimates (F 
= 39.39; p < .001); (b) there was no significant difference in 
perception of the very narrow and narrow interval estimates (F - 1.55; 
p = .211); and (c) the wide interval estimates were judged to be 
significantly closer to the mean than were the narrov^ and very narrow 
estimates (F ='121.77; p ^ .001). , , . 

Thus, while the teachers* in this study did apparently interpret 
ptoint and interval estimates differently, they did not distinguish 
'systematically between the very narrow and narrow confidence bands. 
The wide interval bands, though, were perceived as significantly closer 
-to the mean than the other two interval estimates. 

Question 3 : Are there identifiable trends in the interpretations of 
these scores? 

Orthogonal tests of trend were calculated using polynomial 
coefficients from Winer (1971). The results of these contrasts are 
presented for the below the mean scores in Table '1 1 1-4 and for the . 
above the mean scores in Table I I 1-5 . 

For the below the mean scores, there was a significant linear 
trends and an arguable quadratic trend (F = 5.36; p = .020) Beyond the 
linear trend, depending upon the reader's preferred level of 
significance. The cubic trend was not statistically significant. For 
the above the mean'scores, th^linear, quadratic and cubic trends were 
statistically significant. These trends are illustrated in 
Figures 1 1 1—1 and III-2, respectively. 

Figure III-l is suggestive of . a linear trend "for the below the 
mean scores, in which ratings approach the mean as one moves from 4 t 
point estimate to increasingly wider interval estimates. Figure "lii-2 
is suggestive of a cubic trend', thanks mostly to a dramatic change for 
the wide band ratings. ^Againras one changes' from a point estimate to 
increasingly wider interval estimates, the assigned ratings decline 
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; TABLE' 1 1 1-4 
Summary of Tests of Trend for Scores Below Mean 





Trend 


df 


MS 


■ F 


Probability 


Linear . 


, 1 


88.46 


107.74 


.000 


Quadratic 


1 


' > 4.40 


5.36 


.020 


Cubic 


1 


2.14 


2.60 


.104 - 


Residual 


. 312 


. ' '0.82 












TABLE II 1-5 






Summary of Tests of Trend for Scores Above Mean 




• 


Trend 


df 


MS 


F 


Probability 


Linear 


1 


109.71^ 


1 10^49 


.000 


Quadratic 


1 


28.81 ' 


29.01 


.000 


Cubic 


1 


23.05 


23.21 


.000 


Residual 


312 


0.99 







0 
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towards the mean. From these data, it is clear that trends in the 
interpretations "of given scores can be identified, and the shape of the 
.trend depends upon whether the scores are below or above the mean. 

Summary 

While the task in this study was contrived, the. data suggest some 
very interesting conclusions may be drawn. First, educators if the 
sample used in this study is at all representative of teathers 
elsewhere -- do interpret point estimates and interval estimates ' 
differently. The general trend was to perceive a score' as beir\g closer 
to the mean when presented in increasingly wider interval estimates. 
In other words, these teachers tended to give systematic overestimates 
of scores below the mean and underestimates of scores' above the mean 
when those scores were presented in interval band form. On the one 
hand, this is not unreasonable when the test reliability is zero, as 
the best point estimate for-a randomly selected in-dividual is the group 
mfean. However, for the narrow and very narrow intervals, which 
represented rel iabil i ties of .75 and .89, respectively, such an inter- 
pretation strategy is clearly inappropriate. This brings us to the 
second conclusion, that these teachers did not demonstrate an 
understanding of how a* confidence band should be interpreted. Finally, 
since confidence bands, better express the degree of accuracy with which 
human performance may be measured,, reporting procedures may require a 
thorough examination if the producer wishes-folks to draw, appropriate 
interpretations from the^data. 
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Substudy IV 



Teachers' Estimates of Costs and Losses in Decisions 



Moving from an interpretation to some definite action requires 
that a decision be made. The quality of a decision will depend upon 
the quality of information available for processing, as well, as the 
capability of the individual to interpret and integrate* the informa- 
tion in a sound manner. Thus, the perceived quality of information 
available to teachers will, apart from their skill at interpreting 
it, affedt the kinds of decisions which teachers make. This concl usion 
is underscored by the work of Fleming and Antonen (1971), Goslin 
(1967); Rudman et al . (1980) and Kellaghan, Madaus and Airasian (1982). 
A second, personal factor which might affect behavior is the percep- 
tion of how probable a correct decision might be. Finally, the 
perceived consequences or risks of an incorrect decision may well 
affect the choices which people make (Kahneman and Tversky, 19.73), 

The present s^d^rwas initiated to. answer three specific 
questions relategKto the quality of information and decision 
likelihoods: (a) How accurately do teachers believe standardized 
achievement test scores estimate pupil skill? (b) What outcomes or 
decisions are perceived 1 of as having greater import? (c) What do 
teachers perceive to be the likelihood of making incorrect desisions? 
These questions serve to outline the focus of the study. 



Methodology 

Sample ■ 

r ■ ■ ' ' ' \ ' 

Participants were 215 public school teachers, from fourteen 

different school districts in Mississippi. These* di stricts 

represented the western, southern, central and northeastern regions 

of the state. Approximately 80% of the sample were females and 

about 20$ were males. These participants were in attendance at a 

week-long workshop on test development. f". . . 

Instruments u 9 

Data for the first question come from a three-response task . 
asking participants to judge the percent of students whose test 
scores on the California Achievement Test (CAT) represent an 
accurate reflection of their true skill ; the percent who receive a 
too-low score; and the percent who receive a too-high score. The 
.directions reminded the participants that these three values should 
sum to 1 00%. This measure is represented by items 1-3 of the 
booklet presented in Appendix F. 
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Data for question tw.o came from one of twa sets of six:. forced- 
choice stimuli for which participants were asked which of i'two' outcomes 
they believed to be worse. These two sets differed, in that one posed 
the question for atypical students (either very good or very poor), 
while the second posed the question for average students. 

An example of the "loss ratio" items is: 

* 

Select the statement which you believe is the worse of the 
pair of statements. 

Which is MORSE: ' ' 

a. Accidentally placing a poor student in an advanced group or 
class. * f 

b. Accidentally placing a good student in a remedial group or 
class. - 

The two sets of stimuli are contained in the first and second 
booklets in Appendix F. In each booklet, the forced T choice stimuli 
are items 4-9. 4/ 

Data for the third question came from one of two sets of seven 
forced-choice stimuli for which participants were asked which of two 
outcomes they believed to be the more likely. These two sets 
'differed in that one posed -the question for atypical students 
(either very good or very poor), while the second posed the question 
for average students. 

An example of the "likelihood" items is: 

Select the statement which you believe is tjie M0RE\ 
LIKELY of the pair of statement to occur. For each 
question, the student is of AVERAGE achievement level. 

Which is MORE LIKELY: 

a. A student performs very well on a classroom test. 

b. A student performs very poorly on a classroom test. 

The two sets of stimuli are combined in the first and, second booklets 
in Appendix F. In each booklet, the forced-choice stimuli are 
items 10-16. 

Participants completed the entire booklet in a single session. 
All participants were able to complete the three parts easily within 
twenty-five minutes. 



Results 



Question 1 : How accurately do teachers believe standardized achieve- 
ment tests estimate pupil skill? • ' 

Overall, the mean estimate fbr the percent of accurate scores was 
62.6. Mean estimates of the frequency of too-low scores and too-hiah 
scores were 22.5 and 14.7, respectively. This suggests that these^ 
teachers believe that standardized achievement tests are on target 
about two-thirc|s of the time.. Further, when the tests are believed 
inaccurate, the perceived tendency is to err towards an unrealistic 
cally low'rather than unreal istically high score.* A multivariate 
analysis of variance tMANOVA) was calculated to compare t'hese 
estimated percentages among teachers of different gender, test 
course status, certification and teaching level. 

The dependent variables chosen were the first two .percentages 
(accurate and too-low). The reason for not including all three was 
the fact that forcing the'values to sum to 100 introduces -.a 
dependency; that is, respondents only had two degrees of freedom J 
in their selection. Independent variables included gender, .test 
course status (whether or not participant had ever taken a course 
in tests and measurements); certification (A, representing a B.S. 
levefl; AA, representing an. M.S. level; or AAA, representing an Ed. S. 
level of coursework); and teaching, level (elementary, secondary or 
both). A summary of the main effects MANOVA contrasts is presented 
in Tabl^ IV-1. * . 

In each case, there was no statistically significant difference 
among the contrasted groups. Interactions, not presented in the 
Table IV-1, were also not significant. 

Thus, the perceived frequencies of right or wrong results 
coming from a specific achievement test were similar regardless 
of respondent gender, test course status, certification or teaching 
level. 

* 

Question 2 : What outcome or decisions are perceived of aS having 
greater import? 

The results of the forced-choice ihstrument measuring, perceived 
losses associated with incorrect decisions are summarized in 
Table IV-2*. Each of the* items forced a choice between a false 
positive Ce.g. » a student passing a test when he or she did not 
know the material) or a false negative outcome (e.g., a student 
failing a test when he or she did in fact know the material ) . 
The tabled percentages represent the frequency that a particular 
outcome was selected as worse. 

In general , there was congruence between the;* observed percentages 
for the atypical and average student\sets. The types of outcomes can 



Ke conveniently divided into two classes: Test results or decisions 
and instructional outcomes. The overall ratio o% false positive (fV) . 
to false negative (JN) selections (called a loss ratio), was markedly 
affected based on which class of outcomes w§a examined/ For test 
results, the loss ratios were 0.46 for atypical students and 0.78 
for average students. This suggests that the participants believed 
the wjJrse outcomes for students to be test scores or decisions which 
underestimate rather than overestimate the true level of performance. 
From a study using seventh grade students. Morse, (.1977) found that 
students^ would tend to agree. Their perceived loss ratio was 0.47. 

The resulting loss ratio for the instructional outcomes was 
quite different^, though. For the atypical student item set, the 
resulting FP/FN value was 4.72, while for the average student set, 
the value was 10.40. The participants believed* that false positive 
outcomes are considerably worse than fa-Tse negative outcomes for 
students. That is, the teachers, would apparently choose to en* in 
the direction of holding the student back/rather than pushing too 
quickly. The marked difference between the loss ratio^»fGr the 
atypical and average students suggests that the perceived disparity 
in FP and FN instructional outcomes is seen as more severe for average 
students. The loss ratio of the seventh grade students in Morse's 
study (1977) was not nearly as dramatic a departure from the t;est 
outcomes value, being 2.60. 

.Question 3 : What. do teachers, perceive to be the likelihood of making 
ncorrect decisions? . • 

The results of the forced-choice instrument measuring judged 
likelihoods associated with incorrect outcomes or decisions are 
summarized in 'Table IV-3. for these items, the congruence between 
the judgments for the atypical and average student sets was much 1 
closer than for the loss ratio items. A similar pattern of. j 
different perceptions of test or performance versus instructional / 
outcome likelihoods was noted though . > 

The judged likelihoods of incorrect test or performance outcomes 
suggest that false negative outcomes are considered the more likely 
(FP/FN =0.54 and 0.64 for atypical and average students, respectively). 
The picture reverses for instructional outcomes, in which false 
positive outcomes are judged to be for more common (FP/FN = 2,. 51 and 
3*75 for atypical and average students, respectively). . . 

t The. estimates of too-low and too-high test performance, discussed 
above y in Question 1, give an independent check for test outcome . 
likelihood. For. those data, the likelihood ratio (FP/FN) was 0.65, 
which is congruent with the values/obtained from the likelihood item 
sets. 0 * \ 
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The estimates from this instrument, as should be obvious, are 
* of relative error 1 ikel thood, as opposed to. absolute likelihood 
judgments. The task used in Question 1, though, 'was an absolute 
judgment t^sk. That its results were congruent with the relative 
judgments from the forced-choice instrument suggests that switching 
to judgments of absolute 1 ikel ihoods " might not alter the relative 
error estimates. , 



Summary 

Teachers apparently -have at least a modicum of faith in standard- 
ized achievement tests, at least in the accuracy of the resulting 
scores. When incorrect results arise, they are perceived as being more, 
often lower rather than higher than appropriate. Test theory suggests 
that, if necessary. assumptions hold, errors of measurement are random 
rather than systematic (Lord and Novick, 1968) . Perhaps this opinion 
reflects personal observations of some students being unable to perform 
wel 1 . 

f A more important conclusion is that false negative outcomes are 
perceived. as less desirable than false positive outcomes for test 
'decisions, yet for instructional outcomes, a false positive outcome is 
considered much worse than a false negative. Thes£ two observations 
suggest that there is a perception of test decisions somehow being 
independent of instructional decisions or outcomes. In other words, 
the link between test$v as an example of controlled assessment and 
subsequerit^instructiona.l decisions for pupils is either not perceived . 
as important or is ignored. Either way, these data suggest an 
incoherent system: the preferred error for testing is to pass the \, 
student who doesn't have the skills but the error' of choice/for 
instruction is to hold back ^jtudents who do have the requisite skills^ 

The tabulation of 1 ikel ihood estimates again suggests that these ^ 
teachers — . and teachers in general if this sample is at a1£? represen- 
tative of other teachers. -- are operating in an' incoherent system, as a 
Bayesian statistician would use the term (Novick and Jackson, 1974). 
In order to minimize overall "cost 11 or "loss" to a system, the appro- 
priate strategy is to alter 1 Ikel ihoods of outcomes so that the. 
products of loss ratios and likelihoods are at a minimum. Yet, these 
data Suggest that the most costly, or the least desirable, decisions or 
outcomes are considered to be the most likely outcomes. (The reader 
should note that these are relative. error" rates -being 'discussed and not 
absolute rates.) 0 L 

One possible hypothesis is that the error which is observed most 
often is that which becomes, judged as the more severe. If true, this \ 
hypothesis would serve to explain, in large part, the observed results. 
However, the patterns observed in -the judgments suggest thaJ?*%m al ter- 
native hypothesis that generally incoherent decision-making schemes are 
in effect in education settings must also be considered as a 
possibility. 
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TABLE IV : 1 

Summary of MANOVA Contrasts on "Hit Rate" Estimates 



Contrast 


Wilks' Lambda 


Approximate F 




Probability 


Gender 


.959 


2.84 


2,132 


■ .062 


Test Course 


.992 


0.51 


2,132 


.601 


Certificate 

r 


..979 


0.69 


r 4:264 , 


-.596 


Leyel * 


.982 


0.58 


4,264 


.674 
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TABLE IV-2 
Teacher Loss Ratios 
by Type of Student 

\ 



A. For atypical students: - 




Worse Outcome 


Circumstance 


False Positive 


False Negative 


1. Incorrect placement 


41% 




£9% 


2. Test performance . > * . 


. 23% 




77% 


3. Moving on ; vs. remaining with material 


85% 




15% 


4. Speed of presentation of material 


76% . 




24% 


5. Test performance; minimal P-F 


. 31%. 




69% 


6. Outcomes of incorrect instructional 
decisions ' 


87% ' 




13% ' 


Summary:. Test results or decision (1 ,2,5) 


: FP/FN 


= 0.46 


Instructional outcomes 


(2,3,6) 


: FP/FN 


= 4-72 " 


V 








B. For average students: ^ 




Worse Outcome 


Circumstance 


False Positive 


False Negative 


1. Incorrect placement 


37% 




- m 

63% 


t 2. Test performance 


47% 




53% 


3. Moving on vs. remaining with material 


90% 




10% 


4. Speed of presentation of material 


90% 




10% 


5. Test performance;, minimal 'P-F 
• 


47% 




53% 


6. Outcomes of incorrect instructional 
decision 


95% 




5% 


Summary: Test results or decision (1,2,5) 


: FP/FN 


0.78 " 


Instructional outcomes 


(2,3,6) 


: FP/FN 


= 10.40 



Note: Values for parts A and B are based on 143 and 72 respondents, respectively. 



' . TABLE IV-3 
Teacher Likelihood Estimates 
by Type of Students 



A. " For atypical students : 
Circumstance 



Worse Outcome 



False Positive False .-Negative 



1. 


Student work Is atypical 


.20 


.80 


2. 


C%T score Is opposite expectation 


.38 


.62 


3. 


CA£ score Is too. far In same direction. 


.43^ 


.57 


4. 


Classcoom test performance Is opposite 








expectation 


.30 


.70 


5. 


Semester grade Is opposite expectation 


.88 


• 12 


6. 


Placement Is Incorrect 

i • 


-.55 ■ 


* .45 


7. 


Classroom te^t P-F status Is opposite 




- r 




expectation 


.45 


■'" .55 



Likelihood ratios: Test or performance ( T -4 ,7) : FP/FN « 0.54 

Instructional outcomes (5,6) : FP/FN = 2.51 ' 

■ B. For average students : * Worse Outcome 



Circumstance 



False Positive False Negative 



1. Student work is atypical .53 

2. CAT score 1$ opposite expectation .10 

3. CAT score is too far In same direction .16 

4. Classroom test performance is opposite 
expectation .74 

5. Semester grade, is opposite expectation .90 

6. Placement is incorrect .68 

**. . 

7. Classroom test P-F status is opposite 
expectation .42 



.47 
.90 
.84 

.26 
.10- 
.32 

.58 



Likelihood ratios: Test or performance (1-4,7) : FP/FN = 0.64 
Instructional outcomes *{5»6) : FP/FN - 3.75 

Note: Values for parts A, B based on 143 and 72 respondents, respectively. 
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Substudy V . 
• . *- . f 

Factors InfVuencin^Teacher Estimation of PupilPerformance 

A ^ •;• . 

Educational decision-making is governed- in Tarqe part by the 
dynamics of how teachters interpret the performance information avail- 
able to them. Interpretations, once drawn, form the basis for aotion. 
The question of conce/n here is: What factors may be shown to affect 
interpretation and judgment of pupil performance? 

Even though the now-infamous "Pygmalion" study (Rosenthal and • . 
Jacobson, T96^) was widely critiqued for its lack of experimental 
rigor, the question was raised as to whether teachers' judgments 
of pupMl performance could be*affected by inaccurate or unrelated 
information. Fleming and Antonen (1971), in an'attempt to replicate 
the Pygmal.ion study, did observe that teachers did vary considerably 
In the degree of accuracy they were willing to ascribe to different _ * 
types*of test performance information. This perceived accuracy* varied 
as a function of the degree of utility which was attributed to the 
particular type of performance information. ' 

Teachers apparently temper their judgments of performance • 
information accuracy on the basis of other information. Frederickson 
and Marchie (1966) noted that teachers asked to rate'the validity 
of a given test score for one of their pupils, were much more likely 
to accept score information congruent with their prior beliefs than 
to accept incongruent score information. Leither (1976) found that 
teachers, even when asked to avoid all extraneous data, had a marked 
tendency to draw upon their knowledge of unrelated student background 
information in order to interpret test performance information. 

' Examples of the types of extraneous information whfch.have been 
shown to affect teacher judgments are many. Perhaps one of the m° s t' 
widely-publicized is that of the pupil's name. Harari and McDavid (1973) 
found significant differences in teacher>ati ngs of the same student , 
work depending upon what' name was attached to the work." 

There is evidence to suggest that t prior information does' mediate 
decisions made on follow-up information. Shavelson, Caldwell and Izu 
(1977.) noted that such decisions' are . determined in part by the congruence 
of the follow-up information with initial data, as well as the reliabili- 
ty of the information. . Farther, the Shavelson et al . .study suggests that 
while perceptions of pupil capability or chances for success are more 
readily altered. than, pedagogical decisions, the types of pedagogical _ 
actions teachers report as best for a particular pupil d<]r change as their 
perceptions of the pupil's capability chang.es. ^ 

1 > 

Thus, if teacher's 1 judgments do have an effect upon their behavior 

towards ' ptipi Is, it is important to examine factors which may contribute 

to these judgments. The! present study incorporated each of the factors 
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suggested by prior research results. Extraneous information was; 
represented by 'inclusion of the pupil's first name. Use of prior 
information was presented by requiring two separate decision points., 
the second coming after the presentation of follpw-up information on 
the pupil. Congruence of information was incorporated by notching 
or mis-matching initial and follow-up information (each was either 
positive or negative). - Reliability 'of information was represented 
by providing quite different sources of -information. In this way, 
the study allowed the examination of the .interactive effect's of these 
factors as they affected teacher decisions. 

The specific questions which define the scope of the study were: 
(a) Does pupil gender, initial appraisal , follow-up information or t 
reliability of information affect teacher decisions? (b) Are these' 
differences inteacher decisions attributable to^gender, training, 
certification or N teaching level? 



Methodology , . 

Sample . .. ' 

Participants were 163 teachers, with varying levels of experience. 
Education levels were approximately evenly divided between undergraduate 
training only (46%) and graduate degrees. (M.S. 43%, -Sp.ec1a1.1st degree, 
11%). All respondents were participants in a training workshop and 
voluntarily completed the instrument'. Complete data were obtained 
<, from 157 of the 163 teachers (96%). 

Instruments 

The instrument used for this study was a slightly altered version 
of that used in the Shavelsbn et al . study. Respondents were presented- 
with initial information for a ."student" and were then asked to judge 
the chances (between 0 and 100%)- of the student obtaining all A's and- 
B's on the report card. The initial information varied by valence >t 
(either positive or negative in the description of pupil's ability, 
study habits and family background) and by gender (the student was given 
either a male or female name, no surname supplied). 

After judging the child's chances for success, the fol low-up _ 
information was presented. This information varied by valence (either 
positive or negative'Mn the descriptions of the child's achievement 
and "attitude" towards school j, and reliability (the information coming 
from reliable and authoritative sources or from unreliable and 
unauthoritative sources). Respondents were then asked to judge again, 
in light of the follow-up information, the child's chances for success 

in school . , 

f . 



the possible conditions thus , formed a 2x2x2x2 factorial design, 
with the additional repeated measures dimension of teacher Judgment. 
The factors were: initial information valence; pupil gender; follow-up 
information valence; and follow-up information reliability. 

Otfier questions posed at the time" of each judgment included: 
(a) whether the textbooks to be chosen for the student should be at, at 
or above, or below the student's grade level; (b) how the teacher would 
react if the child hesitated in answering a question in class; and 
(c) how important it was to praise the child every time he or she did 
good work, These questions are referred to as the textbook-, ques- 
tioning and reinforcement decisions, respectively. A copy of the 
information paragraph types and a sample booklet are contained in 
Appendix G. ' 

The ordering of factor conditions-was randomized prior to 
distribution of the booklets. Each participant was able to complete 
the task easily within twenty-five minutes. 



Results 

Question £f: Does pupil gender, initial appraisal, follow-up 
information or reliability of information affect teacher decisions? 

Contrasts of initial judgments by valence and gender indicated a 
significant information valence Effect, but only trivial differences 
dtie to pupil gender (F for valence = 268.91; F for gender = 0.'86; 
df =1/153). These results are displayed in Table V-l. Because of 
this, the initial judgments were used as a covariate for the contrasts 
of follow-up judgment by all four factors. The. summary information for 
the ANCOVA contrasts of final probabil ity estimates is contained in 
Table V-2. > . 

^From the data in Table V-2, it is apparent that when follow-up 
„ judgments are adjusted for initial, judgments, pupil gender and 
follow-up information valence were significant main effects (p - .011 
and p < .001, respectively). The reliability of the follqw-up infor- / 
mation, while not significant as- a main effect, was part of significant 
two-way interactions with both initial and follow-iip information 
valence (p = .002 and p < .001, respectively). No other interaction 
was statistically significant. * . * . ^ 

Means for the differences in judgment (follow-up estimate - 
initial estimate) by valence condition for male and female names are 
•presented in Table V-3V These means suggest several important results: 
(a) When the two information sets were congruent in valence, the 
differences were considerably smaller than when they were incongruent. 
• (b) Respondents were systematical ly favoring, the male student over the 
female in their judgment revisions. Positive mean changes in judgment 
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TABLE V-l * 
ANOVA Summary of Initial Probability Estimates 



* 

Source 


df 


MS 


F 


Probability 


Gender (G) 


1 


277.49 


0.86 


.355 


Initial information 
valence (I) 


1 


86210.66 


268.91 


.000 


G x I 


1 


660.40 


2.06 


,149 


Residual 


153 


320.58 






Total 


156 


873.07 




f 



* 




TABLE V-2 

ANCOVA Summary of Final Probabil i.ty Estimates 



Source 


df 


MS 


F 


Probabil i ty 


Covariatft 

vU f U 1 |U 




15799 35 


48.60 


.000 


Gpnrlpr ( fc) 

VJCIIvICI \ VJ / 




2171 .70 


6.-68 


.011 


Initial information 
valence (I) 




67.1 .70^ 


2.07 


.153 


Follow-up Information 

pncp ( F \ 




38294.21 


117.81 ' - 


.00JD 


Fnllnw—nn i n^nrma ^4nn 

ru 1 1 Un Up 1 II 1 U » HIQ U I Ull 

reliability (R) 


1 


89.15 


0.27 


.601 


G x I 




789.28 


2.43 


{ ,121 


G x F ^* 




84.31 


0.26 


. .611 


G x R 




187„81 


0.58 


.488 


I x F 


\ 


166.74 


0.51 


f .475 , 


I x R 




3277.68 


io. oa 


.002 


F x R ; 




18187.49 


. 55.95 . 


. .000 


G x I x F 




19.34 


0.06 


.808 


G x I x R 




82,28 


0.25 \ 


.616 


G x F x R 




94.56 


0.29 


.591 


I x F x R ■ 

■ ■■ < 




363.28 

«g 


• 1.12 


.292 


G x I x F x R $ 


1 


14.663 


., 0.45 


.832 


Residual 


140 


325.06 






Total . # 


156 


888.10 ' 
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TABLE V-3 

Mean Difference in Judgment by Information 
Valence and Gender 



\ 



Female Student 



Follow-up information 
+ 

-12.10 28.57 
Initial 03.31) (32.41) 

information 

+ -30.46 2.24 

(28.16) (5.95) 



Male Student 



Follow-up information 



0.65 

Initial 0 9 - 63 > 
information* 
+ , -24.10 

' (26.25) 



35.29 
(28.96) 

\ 10 
(10.54) 



Note: Values in parentheses are standard, deviations. 
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were larger for the male student, while negative mean changes were 
larger for the female student. The size of this difference by gender 
ranged from 0.86 percentage points for the dual positive valence to 
12.75 for the dual negative valence. In fact, for the dual negative 
valence, judgments for males were revised up about two-thirds of a 
point while the mean judgment for females declined by oVer twelve 
points* 

• Summary statistics for s the significant two-way interaction . 
variables are contained in Table V-4. The patterns for initial and a 
follow-up data valences were congruent. The -rel iable follow-up 
information resulted in differences about six or more times as large, 
as those for the unreliable follow-up information. For the incongruent 
data valences, though, the reliable follow-up information resulted in 
differences -si ightly 0Ver three times as /arge as those for unreliable 
information. v 

Because of the unexpected impact which pupil gender had on the 
outcomes, a follow-up, study was planned. In this, study, *a total of 
fifty-six public school teachers from two school districts in 
southwestern Mississippi was asked to participate-. The design of 
the follow-up study was essential ly the same, with two differences. 
First, two female names', Carol and SCisan were used instead of a male 
and female name. Second, only reliable follow-up /information was . 
presented. Thus, the study represented a 2x2x2 design of name by. 
initial information valence by follow-up information valence. 
Fifty-four usable booklets were turned in. 

The results, presented as an ANCOVA contrast of final probability 
estimates using initial estimates as the covariate, are summarized in 
Table V-5. No significant main effect other than follow-up valence was 
observed (F = 94.15; p < .001). None of the interactions was statis- 
tically significant. Thus, the observed differences due to gender in 
the main study were apparently not due to selection of a disagreeable 
female name. Whether it was caused, in part, by ah especially 
fortuitous choice of male name is still open to question. 

Path analysis models were generated and tested fqr each of the 
three decisions called for: textbook, questioning and reinforcement. 
Following the Shavelson et ai. approach, two interaction variables, SV, 
and RV 2 were created. SV, represents interaction of gender and initial 
valence, while RVU represents the interaction of Reliability and 
follow-up valencQ. However, the SV, variable was not a significant 
contributor to either initial prediction (PE.) or resulting decision 
(TO,, QDj or RDJ, as suggested by the results in TABLE V-2. The 
valence of initial information (V*) was used as the sole exogenous 
variable^for initial prediction, while gender (G) was used as one of 
the two purely exogenous variables for the follow-up prediction (PEo)« 
Kenny (1979) outlines the mechanics of generating and testing path 
models. • 



TABLE V-4 • 

Mean Differences in Judgment by Information 
Valence and Reliability 



j 



Reliable Information: 



Initial 

information 



Follow-up information 
- . + 



-12.95' 50.83 
(14.35) (33.00) 



-42.10 
(28.81) 



6.70 •* 
(8.09) 



Unreliable Information : 

Follow-up information 



Initial 

information 



0.83 
(18.09) 



-13.62 
(16.41) 



14.25 
(.13.89) 



-1.19 
(6. .880 



Note: Values in parentheses are standard deviations. 
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TABLE V-5 

ANCOVA Summary of Final Probability Estimates 
from Follow-up Study 



Source 


df 




F 


Probability 


Cnvarlate 


1 


2530.07 


6.31 


.016. 


Name - (N) 


1 


6.53 


.0.02 


.899 


Initial information 
valence (I) 


V 


444.97 


1.11 


.298 


Follow-up information < 
valence (F) 


1 


•37767.23 


94.15 


.000 


N x I 


1 


649.31 


1.62 


.210 


N x F 


1 


601.12 


1.50 


.227 


I x F 


1 


703.586 


1.75 


.192 


N x I x F 


1 • 


445.97 


1 1.11 


.297 


Residual / 


45 


401.13 






'Total 


53 


1144.30 
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figures V-l, V-2 and V-3 illustrate the path models which 
represent the best fit to the sample data for the textbook decision, 
the questioning decision and the reinforcement decision, respectively. 
For the initial textbook decision, it. is apparent that the initial 
information valence effect overshadows" that of the initial prediction 
by a ratio of about two to one. % The subsequent information, contained 
in the RV 2, variable, the gender (G) and the initial prediction all 

0 combine to affect the follow-up prediction of success (PE?). As a 
result, the follow-up decision is affected most strongly by the 
follow-up prediction, followed by the RV2 variable and the initial text 

""decision (TD-, ). In this instance, both prior and collateral informa- 
tion are being combined in the probability estimates and subsequent : 
decision. , 

The y . same cannot be said to hold for questioning strategy 
decisions. As fllustrated in Figure V-2, the choice of questioning 
/strategy is apparently unaffected by any variable other than initial 
questioning, decision (QDy).' In other words, questioning strateq^tts 
essentially" invariant across the observed factors; teachers seevgF 
have a preferred style or strategy and choose not to alter it..^p"- 

Reinforcement strategy decisions, though, were affected to a 
degre'e by follow-up information'. Figure V-3 illustrates that for both 
initial and follow-up decisions (RD 1 , RD 2 ), the prediction estimates 
and purely exogenous variables all combined to affect the decision. 

Question 2 : Are there differences in teacher decisions attributable to 
gender, training, certification or teaching level? 

An analysis of covariance, using initial probability estimates of 
success as th* covariate, was calculated in order to contrast the 
various levels of the personal variables considered. The ANCOVA . 
results are presented in Table V-6. As is suggested by the figures in 
Table V-6, none of the main effects' examined -- gender, whether or not 
coursework in tests and measurements had been taken,, level of certifi- 
cation (A, AA or AAA) or teaching level (elementary, secondary or 
both) — made a difference in the adjusted final probability estimates. 
Because some of the two- and three-way interaction cells were empty for 
this sample, only main effects were examined, and a pooled within-ceM 
variance estimate was used. 



That teachers 1 judgments depend upon certain, factors is apparently 
a reasonable proposition. Teachers' responses in this study suggest 
that they are sensitive to the congruence of new information with prior 
information, the reliability of information, the gender of the student 
and the valence of performance information. Why male names should be 
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Figure V-2 

Path Model and Coefficients for Questioning Decision 




Figure V-3 

Path Mpdel and Coefficients for Reinforcement Decision 




TABLE V-6 



ANCOVA Summary of Final ^Probability Estimates 
For Teacher* Groups 



Source 


df 


MS 


F 


.Probability 


Covariate 


4 


15016.10 . 


33.02 


.000. 


Gender 


1 


369.14 


0.81 


,369 


Test Course 


1 


201.24 \ 


0.44 


.507> 


'Certification 


2 


454.4) 


1.00 


' .371 


Level 


2 


1087.64 


2.39 


.095 


Residual 


139 


.454.69 






.Total 


' 14-9 


877.96 
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systematically favored over female names is not clear. However, should 
there be even the si ightest^Mnk between this difference observed oh ah 
artificial task and behavior m the classroom, then serious considera- 
tion should be given to approaches by which such inequities may be 
reduced. Second, as Shavelson et al . suggest, it appears that 
teachers' decision-making — with the exception of questioning 
strategy --is somewhat Bayesian in nature. Unfortunately, the par- ' 
ticipants "in this sample were not -equitably Bayesian. Third, the l ink 
between teacher perception of pupil capability and subsequent behavior 
towards the students deserves further investigation. Fourth, teacher 
characteristics d$ not appear to have any systematic effect on 
judgments of a student's chances for "success" in school . These 
results indicate that b6th relevant and irrelevant information are 
incorporated in decision-making. One possible implication is that 
teacher preparation ^should include training on sound decision-making. 




Substudy VI 



Changes in Teachers 1 Knowledge and 
Perceptions of Test Score Data 

• n • y ■ 

o 

That teachers 1 knowledge and perceptions of test score data 'tfre 
perhaps not as sound as desirable is a fairly common conclusion 
(Hastings et al . , 1960; Goslin, 1978; Rjudman et al., 1980). However, 
as Rudman and colleagues pointed out in their 1980 review of assessment 
practice^, not mach has been done to investigate whether teachers 1 j 
knowledge- can be changed by direct intervention, such as staff develop- 
ment training sessions (e.g., in-service education) . The results of 
the previous substudies contained in -this project report suggest that 
no systematic differences in perceptions, use or interpretation of 
performance, data could be attributed to whether or not the participant 
had ta|<en one or more courses in tests and measurements. It may 
well be the case that the topics ifraditionally covered in such 
courses emphasize statistical concepts and/treat the topics of * 
interpretation and use only ligfftly, if at all. A second possibility 
is that the time separating the course work from the present is 
simply too great to allow the retention of measurement concepts. 

The present study-was initiated to provide insight^ on- two 
specific questions: (a) Can teachers' knovfiedge or perceptions of test 
score data be changed as a result of short-term, directed training? 
,(b) Are there differences in the degree of this change attributable 
to teachers 1 measurement background, certification or teaching level ? v . 

I • ' . c 

/ -* * Methodology 

Sample * . 

Kdrticipants in the study were. 245 public school-teachers from 
a MisslWippi school district^ The school distrtct offers instruction 
from grades one to twelve*. The racial mix of the teachers was about 
75% whiteVand 25% black. By gender, the percentage of females was 
about .751/ that of males about 25%. The teachers within the system 
represeirxed a variety of teacher training institutions attended. 
The^idwn in which the sdhool system .is 'based has a population of 
rnqhtfy over 15,000, making it a medium-size city for Mississippi. 

Instruments " 

Four separate subtests were used in this study, two of which 
related to perceptions of test score data while the others measured 
knowledge of tests and test scores. These subtests are discussed 



individually in detail below< A copy of all instruments is contained 
in Appendix' H. . ■ ' ' % ■ - * 

Validity subtest . This subtest was the same as that used for the 
first research question in Substudy I. It was "comprised of one of two 
sets of eight >items,. each of which gave test and ndntest information 
for a hypothetical . student. Ths participant was then asked to judge 
the given test score as being val id, questionable or invalid. An * 
example from the, val idity judgment subtest is; 

A female student, sixth grade, average grade of D in reading and 
; social studies. New CAT- 77 reading comprehension percentile 
is 92. This score-is: 

a. Val id > 

b. Questionable 

c. Invalid 

A response of "Valid 11 was coded as three poitits , a rating* of "Question- 
able 11 as tWo and "Invalid" as one point. The ratings were then summed 
.across items. Thus, high scores represent a greater perceived validity 
of t^est scores in both congruent and incongruenl settings — while 
letf scores represent a lower degree of perceived validity. Overall 

internal consistency reliability, of -this measure wsls . 80. 

* - . 

Knowledge subtest . This subtest was designed to assess how well 
participants could interpret both 3 classroom and ^standardized achieve- 
ment test score data. Several of the. ten items came from those used in 
surveys of test coordinators' (Morse, .1978) and accountability coordi- 
nators 1 (Hills, 1977) perceptions of. teacher competence in measurement. 
Two of'the items came from the Newman and Stall ings (1980) study. 
Several others came from a course in measurement taught by the^ author. . 
All items had b^n thoroughly pretested.' As used, in the present study, 
the ^aw score (number right) was "the criterion , variable. Overal inter- 
nal consistency .rel iabil ity was .65, which compares favorably with 
values reported by Hastings et al. tljJGO) and Newman and Stallings for 
much longer tests. - 

. Test-wiseness subtest . Understanding of sound item and test 
construction practice should permit aVexami nee to detect" and take 
advantage of poorly constructed tests. N{n addition, test-wis^ness 
is a trait which has been shown to be trainable (Morse and Morse, 
1980) for both those skills from the Mil ImanVBishop and Ebel (196S-) ■. 
hierarchy Which are independent of the test constructor and those 
dependent upon the test -constructor. The set of fourteen items was 
drawn, from a study in whiclly^rfse (1980) found that the. test-wiseness 
skills dependent' upon test and test constructor were significantly 
more difficult to apply successfully than were the skills independent 
of test and test cg^tructor (the population used in that study was 



f4fth ancf sixth grade students); The selected* items were therefore 
^venly balanced for test-dependent and test-independent skills. An 
.example of a test-wiseness item is:' V . ; 

- If something is inflammable, it-will . 

a. , resist burning 

b. not catch on fire * 

c. _ not be consumed by -flames 1 
*d. easily ignite * ' 

. , , « ' 

« The* skill required here is to avoid selection of responses a, b 
and c since they each imply a similar result. „ Choice d, being 
.uniQue, is the preferred selection. Simple raw score was used. . 
Overall internal consistency reliability of tffls measure was .77. 

; - * Preference subtest . This subtest was taken from the card sort " 
ta£k used by Gos.lin (1967) and originally used by Hastings et al . 
(I960). It is comprised of twenty-eight "records' 1 ' each containing ,, 
some combination of test and nontest information for a hypothetical 
student. For each ease, the participant was asked to judge whether 
the high-school student should be placed in a regular or advanced '.'< 
science class. On fourteen of the cards , ; the data were uniformly 
positive or negative, which should lead to little variation -in -assign- 
ment. On the remaining fourteen,. however, 'the information was, 
incongruent (often the record was incomplete, a] so). Thus, the chosen 
assignment could be an indicator of the degree to which the participant 
attended to the test information as opposed to the ribntest information. 
* '."*■■ . ■ • 

Two' modifications were^nade for this study concerning the subtest, 
First, participants only were given fifteen of the casies to assign.-, r s 
^Two different forms were prepared, each having »two common and thirteen 
unique cases. Forms were then randomly assigned to the participants. 
Subsequent examination of the two "anchor" items indicates no systematic 
differences in responses could be ascribed to the form received.^ The 
second difference is that the scoring procedure^used in the <Goslin . 
study was altered slightly. The final score, though, still represented 
a relative percentage of preference of test versus nontest information. 
Hence, scores over 50 indicate a more frequent dependence upon the 
test data, while scores below 50 represent a more frequent use of the 
nontest data in making the assignment. Internal consistency reliability 
for this scale was .85. The items an8 scoring procedure are contained 
in Appendix Fl. ' 
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Design 

The research design was a hybrid quasi-experimental approach. 
Using a modified Campbell and Stanley (1966) notation, the design 
was: • 



Fifteen-day span 



o 1 x o 2 



A 0 1 X °2 

where: A indicates random assignment by school (9 in all) to group 
X denotes training in test interpretation 
• 0, denotes pretest 

O2 denotes immediate or delayed posttest. 

Logistic considerations prevented true random assignment, which would 
have been preferable. Also, the retention test could not be^spaced 
as far behind'the treatment as proposed due to school calendar . 
limitations. The design -did- allow for inferences as to pre-instructional 
level, pre- to post-instructional changes and short-term. retention. 

Treatment 

The treatment, part of an on-going program in -test development, 
consisted of a 'single three-hour afternoon session. The general 
topics which were covered emphasized interpretation of test score 
. data rather than attitudes toward tests and test data. An outline 
of the session follows. 

Training Session Outline 

"Using Tests and Test Scores Wisely" 

• I. Introduction " • ( 20 minutes) 

A. The many uses of test results^ 

B. The many types of tests 
C>. The -many types of test scores 

II. Comparing' different test^scdres^/ . . 

A. Raw scores vs. derivetr-strdfes ^ ( 40 minutes; 

B. Common derived scores and their interpretation 

C. Appropriate scores for norm-referenced, criterion-referenced 

tests ^ 
III. Fallibility, of test scores (.30 minute?) 

A. Errors of measurement . - 

B. Confidence bands 4 

C. Factors which affect accuracy of test scdres . 
BREAK - (20 minute's) 
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IV. Small group sessions (by grade level) -* (45 minutes) 

A. Interpreting Ytandardized test results 

B. Interpreting criterion-referenced and classroom test results 

C. Sample student records — interpretation .of test scores and 
contrast with ather achievement data 

V. Summary and discussion (25 minutes) 



Results 

Question 1 . Can teachers 1 knqwledcje oi^ perceptions of test score data - 
be changed as a result of short-term, directed training? V 

Summary statistics for all subtests on each occasion are presented 
in Table VI-1. An increase in the overall mean scores was noted on 
each . of the subtests. Because no differences were observed between 
the immediate and short-term retention outcomes, those results were 
pool/ed. It may well be the case that there w$s considerable mental 
notje-exchanging taking place between the -two groups, contaminating 
the results to an indeterminate extent. 

Simple, pretest^posttest contrasts indicated statistically * 
significant gains for the knowledge subtest (F.= 264.95; df = 1,244; 
p < .001) and for the test-wiseness subtest (F = 64.92; df = 1,244; 
p < ,001). However, there were no systematic differences for either 
the validity subtest (F = 0.52; df =1,244) or the preference subtest 
(F = 0.34; df = 1,244). These differences, expressed as effect sizes, 
are summarized in'Table VI-2. The net change for the knowledge 
subtest was about a full standard deviation, while that for the test- 
wiseness subtest wa^ about one-half a standard deviation. 

Intercorrelations among the subtests at each occasion are presented 
in Table VI-3. It is interesting to note that the values changed onty 
modestly from the first test to the follow-up test. ■ 

Question 2 : Are there differences in the degree of this change 
attributable to.teachers' measurement background, certification or 
teaching level? 

A multivariate analysis of variance (MANOVA) was calculated 
for the subtest vector comparing the various teacher characteristics 
on the. pre-instructional test. These results are presented in Table 
VI-4. The main effect of teaching level (elementary, secondary or 
both) was significant at the .05 level , as .was thp certificate by 
level interaction and the course by certificate by level interaction. 
Univariate ANOVA contrasts were- then calculated for each significant 
effect. These results are contained in Table VI-5. Only one contrast 
for each interaction was statistically significant at the .05 level. 
For the certificate by teaching level interaction, the difference 
was observed on the knowledge subtest. Cell .means and sizes are 

52 




\ 

TABLE VN1 

Summary Statistics for 
Knowledge and Perception Measures 



Initial Test Follow-up Test 





Measure 


Mean 


S.D. 


Alpha 


K 


Mean 


S.D. 




Validity 


17.44 


8.27 


.so 


8/8 


17.72 


8:52 




Knowledge 


3.96 


1.69 


.65 


10 


6.19 


2.53 




Test-wiseness 


9.91 
• 


2.78 


. .77"/ 


U 


11.37 


2.88 




.Preference 


64.63. ' 


21.35 


.85 


5/9 


65.10 


21.48 
* 



Note: Values based on 245 respondents. 

K represents number of scored items; dual values indicate alternate forms. 
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TABLE ' VI -2 

Overall Effect Sizes for Change in 
Knowledge and Perception Scores 



Validity 



.03 



Knowledge 1 



Test-wiseness 



1.04 



0.51 



Preference 



.02 



Note: Effect si2e is (X 2 ■• X^/S pooled; values teased on. 245 respondents. 
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TABLE V 1-3 

Product-moment Correlations Among 
Knowledge and Perception Scores 





Validity • 


Knowl edge 


Test-wiseness 


Preferenee 


Validity . , 




-.33 


-.27 


-.37 


Knowledge 


-.30 




.21 


'.20 


Test-wiseness 


-.29 


.21 




.57 


Preference 


-.36 


.22 


.53 B 

J 





Note: All values based on 245 respondents. 



Upper diagonal values ajpe initial test results; lower diagonal values 
are second test results. 



s 
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TABLE 'VI -4. 



Summary of MANOVA Contrasts of Teacher Groups 
on Initial Knowledge and Perception Scores 



Contrast '■ 


- Milks' Lambda 


Approximate f 


df 


Probability 


Test Course (T) 


.997 


0.13 


4,205^. 


,.971 


Certificate (C) 


.967 


0.86 


8,410 


. .550 


Level (L) 


.925 


2.05 


8,410 


. ' .040 


T x C 


.978 


0.57 


8,410 


,801 


T x L 


.976 


0.-64 


8,410 


.748 


C x L 


* .875 


.1.75 


16,627 


.034 


T x C x L 


.920 


2.19 


8,410 


.027 
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TABLE VI-5 

Selected Univariate ANOVA Contrasts of Teacher - 
Groups on Initial Knowledge and Perception Scores 



Contrast = Level 


(df * 2,208): 






t ■ 


Variable 


Hypothesis MS 


Error MS 


F 


Probability 


Validity 


15.39 


5.59 


2.75 


.066 


Knowledge « 


7.36 


2.48 


2.96 


.054 


Test-wiseness 


0,04 


7.21 


0.01 


.995 


Preference 


460.53 


346.92 


1 .33 


.267 


Contrast. 3 Certificate by Level 


(df * 4,208): 






Variable 


Hypothesis- MS 


Error MS 


F 


Probability 


Validity ', 


9.49 




1.70 


.152 


Knowledge 


6.90 


2.48 


2.78 


.028 


Test-wiseness 


10.92 * 


7.21 , 


1 .51 


.200 


Preference 


630.82 


346.9? 


V.8£ 

i 


,127 


C|ntrast = Test Course by Certificate by Level 


(df - 2,208f: ; 




Viable 


Hypothesis MS 


Error MS 


F 


Probability 


Validity 


■ 42.28 


5.59 . 


7.56 


.001 


Knowledge 


1* 39 


2.48 


0.56 


:573 


Test-wiseness 


4.87 


7.21 ■ 


0.68 


.510 


Preference 


41.81 


346.92 


0.12 


.887 
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presented in Table. VI-6. THie primary "cause" of the interaction 
appears to be the 'difference among the AA (M.S. or M.ld.) level 
certificate holders' pattern of means. s Specifically, for elementary 
teachers, the average is lower. than that or. the A certificate holders, . 
^but is higher for secondary teachers, * 

Table VI "7 presents cell means and sample sizes for the three- 
way interaction as the validity subscale. The relatively small 
numbers of teachers in the no test course group raises a question, 
as to how stable this interaction might be. The flip-flop betweey^ 
higher and lower means across the certificate and teaching level 
combinations explains why the interaction was significant; there 
does not appear to be any clear-cut pattern, though, to this ^* 
interaction. 

, Results on the follow-up test scores adjusted for the initial 
differences, again comparing the teacher characteristic groups, are 
. presented in Table VI-8. The MANCOVA results indicate. that none'of > 
the main effects or interactions was statistical ly significant. Hence, 
follow-up univariate tests were not calculated. 

Certain combinations of teacher characteristics did serve to 
explain part of the initial differences observed on the validity and 
knowledge subtests, but were not systematically related, to the 
degree of change on the set of subtests. 

Summary ■ 

A short-term training session can effect significant gai*s 
in teacher knowledge of interpretations of t^et scores as well as 
in measured tes.t-wiseness . No decrement in performance was observed 
when teachers tested immediately after the instruction were compared 
with teachers tested fifteen days later. No changes were observed 
'on the two perception subscales, which is not surprising since the 
focus of the training was on cognitive- rather than affective outcomes. 

There were initial differences in kfvewledge of score interpretation 
and perceived validity of test score data due to combinations of 
certificate level, teaching level and measurement qaurse work status. 
When follow-up scores were adjusted for initial scores, no systematic 
differences among the various combinations of tfeacher characteristics 
were observed. ■ 

& ■ . • 

.: The implications of this study are important for future research 
endeavors. First, prqfcence or absence of measurement course work 
does not appear to make much difference in knowledge or perceptions 
of test score data. Perhaps elapsed time, unrelated content or a 
combination of the two could explain why those teachers having had 
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_ TABLE VI-6 

Certificate by Level Means for 
Initial Knowledge Scores 



Level 



Certificate Elementary Secondary Both 



A 3.98 ,3.99 3.50 

(43) . . (70) ' (6) . • 

AA • 3.54 4.71 7.00 

(39) (35) (1) 

AM 4.06. .4.73 • .2.50 

(17) (ID (2) 



Note: Numbers}in parentheses are cell sizes. 
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\ Table VI-7 

Test 'Course by Certificate by Level Means 
for Validity Subtest 



Teaching Level 



Certificate 



Elementary 



Secondary 



.Both 



Course No Course Course No Course Course No Course 



A 


1 7.35 


15.78 


. 15.85 


15.88 


15.67 




(34) 


, O) 


(46) 


(24) 


t3) 


AA 


15.59 


19.71 


16.21 


16.00 


14.00 




(32) 


(7) 


(28) 


(7) 


. 0) 


AAA 


• 16.86 


1.5.00- 


16,00 


18.50 


14.00 




(14) 


(3) 


(9) 


(2) ' 


(2) 



13.00 
(3) 



Note: Numbers- in parentheses are cell sizes 



/ 
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9 TABLE VI-8 

Summary of MANCOVA Contrasts of Teacher Groups 
on Follow-up Knowledge and Perception Scores 

> • 



Contrast 


Wi Iks 'lambda 


Approximate F 


df • 


Probability 


Test Course (T) 


.968 


1.69 


4,201 


. 1 .155 


Certificate (C) 


.964 


0.92 


8,402 


• .501 


Level (L) ' ■ 


.973 


0.70 


8,402 


.689 


T x C 


.965 


0.90 


8,402 


.514 


T x L 


.963 


1 0.97 


8,402 


.461 


C x L 


.923 


1.02 


16,615 


.432 




9 








T x C x L 


.973 


0.69 


8,402 


.701 



r 
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measurement training performed no differently than those without. 
Second, the cognitive skills related to knowledge of test data 
interpretation and test-wiseness can be improved as a result of a 
modest intervention. For the authors of studies wbitffeSuggest that 
sound understanding of the relevant principles is a rtecessary pre- 
condition to sound decision-makfrig, these results shoVld be encourag 
Finally, school systems should consider the possibility of devoting 
at least'some in-service time to enhancement of teachers' skills in 
the interpretation of test score data. This is one of the few 
examples of a policy from which virtually everyone --not the least, 
important being the child about whom the decisions are being made — 
stands to benefit. ' J» ' 
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Appendix A 
Validity Judgmen,t Items 
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Key to 



Validity Judgment Items 



Set Item . Child's Gender Child's Grade Nontest Performance Test Score 



T 1 


M 


8 




Low 




High 


2 


F 


10 




High 


..2. • 


Low 


3 


F 


3 




High 




Low 


4 


M 


4 




High 




High j 


5 


'. M 


11 




Low 




Low 


6 


F 


6 




Low 




High 4 


7 


K 


7 




High 




'Low 


8 


F 


2 




High 




H«lgh 


2 1 


M 


5 




Low 




High 


2 


F 


2 




High < 




Low 


3 
4 


F 

* 

• M 


12 . 
9 




High 
High 




Low 
High 


5 
6 


M 
F 


# 

6 
9 




Low 

LOW ^ 




Low 
High 


7 


M 






Low 




Low 


8 


F 

m 






High 




High 



f 
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Set 1 



For items 1-8, decide whether the newly received test score is valid, 
questionable, or clearly invalid. ■ 

1.' A male student, eighth grade, average grade of C-. New^IQ score is 130. 
This score is: 

a. Valid 

b. Questionable . l ■ 

c. Inval id 



2. A female student, tenth grade, average grade of B+. New IQ score is 82. 
This score is: 

a. Valid 

b. Questionable 

c. Invalid 



3. A female student, third grade, average grade of A-. New CAT-77 reading 
NCE is 36. This score is: 

a. Valid . — 

b. Questionable 

c. Invalid 



4. A male student, fourth grade, average grade of 86 in mathematics.. New 
CAT-77 math percentile is 90. This score is: • 

a. Valid 

b. Questionable 

c. Invalid " 



5. A male student, eleventh grade, average grade of 74 in English. New 
CAT-77 language arts percentile is 38. This score is: 

a. - Valid 

b. Questionable * 
Cv Invalid 
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A female student, sixth grade, average grade of D in reading and social 
studies. New CAT-77 reading comprehension percentile is 92. This score 
is: » 

a. Valid 

b. Questionable 

c. Invalid ► * - , 



7. A male student, seventh grade, average grade of A in mathematics. New 
CAT-77 mathematics concepts and problem solving percentile is 28. This 
score is: 

A. Valid 

b. Questionable 

c. Invalid 

8. A female student, second grade, average grade of "excellent" in reading. 
New CAT-77 reading vocabulary percentile is 78. This score is: 

a. Valid \ 

b. Questionable 

c. Invalid 



Set 2 



For items 1-8, decide whether the newly received test score is valid, 
questionable, or clearly invalid. 

■ 

1. A male student, fifth grade, average grade of 1 78" in English. New CAT-77 
language arts percentile is 93.' This score is: ^ 

a. Valid 

b. Questionable " / 

c. Inval id 

2. A female student, second grade, average mark on report card is 90. 
-*New IQ score is 82. This score is: 

a. Valid 

b. Questionable 

c. Invalid 
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3. A female student, twelfth grade, with a semester average of 93 in Senior 
English. New CAT-77 language arts percentile is 30. •This score is: 

a. Valid 

b. Questionable 

c. Invalid 



4. A male student, ninth grade, with a grade- average of B+ in Civics. New 
Stanford Achievement Test social studies percentile is 88. This score is: 

a. Valid ' ' . 

b. Questionable / 
t. Invalid 

it * „ 

5. A male student, sixth grade, with an average grade in'language arts of 
C-. New CAT-77 reading vocabulary percentile is 24. This score is: 

a. Valid 

b. Questionable ^ 

c. " Invalid 

6. A female student, ninth grade, with a D average in home economics. New 
IQ score is 129. This score is: 

* ■» 

a. Valid ' [ ' m 

b. Questionable 

c. Invalid 

7. A male student, first grade, has several notations of "needs improvement" 
in mathematics on his report card. New Metropolitan Achievement Test 
score in mathematics is 24th percentile. This score is: 

a. Valid ^ , 

b. Questionable 

c. Invalid 

8. A female student, eighth grade, with ,an average grade of A. New CAT-77 
reading comprehension percentile is 91. Thi|^score is: 

a. Valid 

b. Questionable 

c. Invalid 
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Nam e 

Last four digits of ** 
social security number_ 



Pi recti ons 

For each item, please select the answer you believe to be best, based on 
your own experience. There are no "right" or "wrong" answers. For each 
item, circle the letter of the answer you select. 

There is a total of 10 items to be answered on 2 pages*. 

Please answer each item and do your own work. 



J 
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Each year, the state sponsors testing of students in grades 4, 6, and 8 
in basic skills on the California Achievement Test. Various types of 
scores are provided for students who take the test. For each of the 
following items*, please select the type of score you believe would best 
help you, as an educator, to make sound decisions about what a student 
had or had not learned. <- 

Please circle the letter of the type of score you select for each item. 

Remember," you should choose the type of score YOU think would best help 
in making decisions about a student's skills. 



1. a. Percentile rank (national) 

b. Scale score (ADSS--a C.A.T. scale) 

0 

2. a. The raw score (number right on test) 
b. Grade equivalent score 

3. a. Gra>de-equivalent score 
b. Percentile rank 

4. a. Scale score 

b. Stanine (national) 

5. a.. Stanine 

b. The raw score 

6. a. Percentile rank 
b. Stanine 

7. a. Grade-equivalent' score 
b. Scale score 

8\ a. Stanine * 

b. Percentile rank 
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* 9. a. Stanine 

b. Grade-equivalent score 

10*. a. Scale score 

b. The raw score 
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APPENDIX C 

Pair Comparison Stimuli 
for Information Types 
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Name 



Last four digits* of 
% social security numbed. 



Directions 0 x ^ 

— ■ — * 

Read each item carefully and respond based on your own beliefs and 
experiences. There are no "right" or "wrong" answers. 

There -is a total of 21 items on 3 pages.. For each item, circle the 
letter of the answer you select. ,; 

Please answer each item and do your own work. 
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When a new student comes, to your school, some t^pe of placement decision 
must be made. For each of the following questions, please circle the 
letter of the type of information you believe is likely to be MOST 
ACCURATE for making sound placement decisions. 



1. - a. The previous year's grades or marks, 

b. The, previous year's standardized achievement test scores. 

2. a. The results of an individual I.Q. test, 
b. The written recommendation of the- last teacher. 

3. a., The previous year's standardized achievement test scores.. 

b. The parents' description of the child's school, accomplishments, 



4. a. The written recommendation of the previous school counselor, 

♦ 

b. Ttie results of an individual I. Q. 'test, f 




5. a. The previous year's local criterion-referenced achievement test 
scores . ^s- ^ 

b.. The results of an individual I.Q. test. 



6'. a. The parents' decription of the child's school accomplishments, 
b. The Written recommendaticyjrof the last teacher. 




7. a. The previous year's standardized achievement test scores. , 
b. . The written, recommendation of the previous school counselor. 

8. a.' The previous year's grades or marks. 

b. The previous year's local cri terion^referenced achievement" 
test scores . * 

9. a. The written recommendation of the previous school counselor. 

b. The parents' description of the child's school accomplishments. 



10. a. The parents' description of the child's school accomplishments. 

* • 

b. The previous year!s local criterion-referenced achievement test 
scores. - • ' 



11. a. The results of an individual I.Q. test, *~ " 
b. The previousyear's grades or marks, * ' 

12. a. The Written recommendation of the last teacher. 

' b. The written recommendation of the previous school counselor. 



13. a. The previous year's standardized ac,fj^vem'eijt test scores, 
b'. The results of an individual I Jl Ifctfst. » 



14. ,a. The written recommendation of th^ last teacher 



b. The previous year's local criterion-referenced achievement 
test scores . • 



15. a. The written recommendation of the -previous schooj counselor, 
b. The results of an individual I. Q.. test.' 

16. a. The previous year's'grades or marks. 

b. The parents' description of the child's school accomplishments. 

T i 

17. a. - The previous year's 'local criterion-referenced achievement test 

^scores. 

b. The written recommendation of the previous school counselor. 

18. a. The previous year's standardized achievement test scores. \ 
b. The written recommendation of the last teacher. 

19. 'a. /The results of an individual I.Q. test. 
The parents' description of- the child's school accomplishments. 
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20. a. The written recommendation of the Vast teacher* 
b. The previous year 1 s grades or marks. 



21. a. The previous year's local, criterion-referenced achievement 
test scores. * . - 

b. The previous year's standardized achievement test scores. 
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Hypothetical Protocol: Form A 



Use the following information to answer items 1-2. 



Student number 72-0013- 



Sex 



Birth 0 4/16/70 



Grade 
3 
4 
5 



Fall G.P.A . Spring G.P.A. Absent Teacher 



1 978- 79 Westside 

1979- 80 Westside 

1980- 81 Westside 



1979-80 MEAP CAT-77' 




R-TOT M-TOT LA-TOT 



ADSS 



430 



National 
Percentile 62 



NCE 



56 

f 



423 

54 
52 



438 

66 
58 



80 j 9 Brooks 
83 ' 8 Mill.er 

Smith 

Otis-Lennon IQ/Spring 1979 
89 

MEAP SFTAA/Spring 1980 
IQ 83 
RSS 388 ! 



1. This information suggests that this'stiident is performing at b level: 

/ ■ A * 

a. Well above her ability 
• b. About equal with her ability 

c. Well below her ability , n 

'2. Which type of information is likely to \>e the most reliable on this 
record? 



a. The G.P.A. * • 

b. The CAT-77 achievement subt^fet percentiles 

c. The Otis-Lennon IQ 

d. The SFTAA IiJ . 
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Hypothetical Protocol: Form B 



Use the following information to answer items 1-2. 



Student number 72-0013 
Grade 



Sex F 



Fall G.P.A, 



3 1978-79 Westside 

f 4 1979-80 Westside 

5 1980-81 Westside 

. i 1979-80 MEAP CAT-77 



National ; 
Percentile 52 



NCE 



52 



54 «' 
55 



80 
79 

78 

, \ 



■ R-T0T M-TOT LA-TOT 



46 
44 



Birih 0 4/ 1 6 / 7 0 
Spring G.P.A. Absent Teacher 
79' 9 Brooks 

83 . 8 Miller 

. Smi th 
Otis-Lennon IQ/Spring [l 979 

. MEAP SFTAA/Spring 19^0 
IQ *' 118 
RSS 472 



1. This information suggests Jthat this student is performing at a level: 



a. Well above her abilvty 

-b. A6out equal with her ability 

c. Well below her ability 



2. Which type of information is likely to be the most reliable on "this 
record? » 

■ * j 

a. The G.P.A. * •• . 

b. The CAT-77 achievement test subtest percentiles 

c. The Otis-Lennon IQ 

d. TJie- SFTAA IQ . • . 



82 



( 



Appendix E 
Percentile Ranks Judgment Measure 
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Key to Percentile Ranks Judgment Measure 



Set 



r 
I 



itry 


z - score 


Rank 


_±l/3 S.D. Band 


-1/2 S.D. Band 


i 

-I S.D. Band 


1 


-0.50 


31 


21-42. 


16-50 


07-69 


2 


+2.00 


98 


96-99 


.93-99 


84-99 


3 


+0*50 


69 


58-79 


50-84 


31-93 


4 . 


-1.50 


07 


04-12 


02-16 


01-31 


5 


• 0.00 


50 


36-62 


^jl-69 * 


16-84 


6 


+1.50 


93 


* 88-96' 


84-98 


69-99 


7 


• -1.00 . 


16 


10-24 


07-31 


/ 02-50 - . 


8 


• -2.00 


02 


. . 01n04 . 


0.1-07 


01-16 


9 


+1.00 


84 


76-90. i 


' ' 69-93 


50-98 
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Name: 



Last Four Digits of 

Social Security Number: 

v 
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T Directions 

On the following sheets, you will find a numbenof test scores, 
expressed as percentile ranks or percentile bands , 

m Your percentile rank tells the' percentage of a norm group that 
you have equaled or surpassed. For example,- if your percentile rank 
for height in this class is 75 r then you are as tall or taller than 
75% of the persons in the class, ^ 

Because test scores tend to .vary 'somewhat due to such chance 
factors as a lucky guess or the choice of questions, we sometimes 
express a score as a percentile band . The percentile band 50-75,' 
for example, would mean that we a^e reasonably "confident that the 
person earning this score is really better than the lower half of the 
group, but not as good as the top quarter of the group. 

When the signal is "given, open your booklet t^'page 1, and begin 
to work. Be sure that you finish each page before going on to the 
next. page. DO NOT TURN BACK JO A PAGE ONCE YOU HAVE LEFT IT . WAIT 
FOR THE. SIGNAL TO START. ~ ~ - 



} 



Percentile Rank 
31 



98 
69 
07 
50 
93 
16 
02 
84 



Rating Key: / 5=Score is well above mean 

teacore is somewhat above mean 

3=Score is equal or 1 nearly equal to 
' / mean - 

2=Score is somewhat below mea>i 

l=Score is well below mean 



Rating (Circle one for each given rank) 



2 
2' 
'2 
2 
2 
2 
2 
2 
2 



3 
3 
3 
3 
3 
3 
3 
3 
3 



4 
4 
4 
4 
4. 
4 
4 

4 ' 
4 * 



5 

5 ' 

5 * 

5 

'5 

5 

5 

5 -* 
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/bating Key: 5=Score is well above mean^ 
, ./ 4 "Score is Yomewhat above mean 

3=Score is equal or 'nearly equal .to 
mean . ' „" 

" . 2=Scorevis somewhat below mean \. 
l=Score is well -below mean * 



Percentile Band 


"Rating (Circle 


orte f6r each given band) 




21-42 


1 


2 


• 3 


4 f 


5. 


96-99 




2 


3- 


4 


5 


58-79 


1 


2 


' 3 


4 


5. 


04-12 / 


' 1 - 


2 


- 3- 


4 V/ 


5 


38-62 


1 


2' - 


3 


4 


5 


\ 88-96 


• 1 J 


. 2 


3 


4 ; v < 


5 


10-24 


■ 1 


2- : 


3 


4 


5 


01-04- 


1 J 


2 




• 4 


5 


• 76-90 


"1 ' 


2 


3 " 


4 


5 
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Rating Key: 5=Score is we/I above 'mean 

4=Score is sjwnewhat above mean 

. - ' t 3=Score is equal or nearjy equal to 

mean 

2=Score fs somewhat below mean 

l=Score is well below mean 

Percentile Band Rating (Circle one for each given band) 

16-50 1.2- 3 4 5. 

g3-99 • 1 2 3 ► 4 * ' • -.5 

50-84 . . 1 2 ; . 3 • 4 - -5 

02-16 1 2 3 . 4 . 5 

31-69 1 v 2 3. 4 5 

84-98 1 2 . . 3 ' 4 < 5 . * 

07-31 - 1 2 3 4 5 

01-07 1 2 3 4 5 

69-93 1 2 3-4 .5 
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Percentile, Band 
07-69 
1 84-99 
. 31-93 
... 01-31 % 
16-84 
69^99 
02-50 " 

01-16 
50-98 



Rating Key: 

V 



5=Score 
4=Score. 
3=Score 
mean 
2=Score 
l=Score* 



is 
is 
is 



well above mean 
somewhat above mean 
equal or nearly equal to 



is 'somewhat below mean 
is well below mean 



Rating (Circle one for- each given band) 



V ' 2 


3 


'4 


5 


2 


3 


4 


5 


2 


3 


4 


5 


2 


3' 


4 


5 


2 


3 


4 


5 


> 2 . 


3 


4 


5 


2 


3 ; 


— -4 


5 


2 


3 


4 


5 


2 


3. 


4 ' ' 


5 
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Appendix 



Loss Ratio and Likelihood Ratio 
Estimate Measures 
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Key to Loss Ratio and 
* Likelihood Ratio Stimuli : Both Versions 



Responses 



Loss Ratio Item 



1 

2 
3 
4 
5 
6 



1 

2 
3 
4 
5 



Test Item 



5 
6 
7 
8 
9 



Likelihood Ratio Item Test Item 



10 
11 
12 
13 
14 
15 
16 



False Positive 



False Negative 



a 
^a 
b 
a 
b 
b 



V 

' Responses 



i1.se Positive 
x a 

b 
b 

a . 

a 

b 

a * 



False Negative 
b 
a 
a 

b 
a 
b 
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FORM A : ATYPICAL STUDENT VERSION 



Name 

Last four digits of 
social security number_ 



Directions 1 ■ ■ \ 

Read each item carefully and respond based on your own beliefJfand 
experiences. Except for the last four questions, there are nc? "right" 
or "wrong" answers. 

For items 1-3, you will have, to write in your response. For items 
4-21, please circle the letter of the, answer you select. . 

Please attempt every ^vteiy and do you^jpwn work. 
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For questions 1-3, please choose your answers so that. the numbers sum to 100 

, • if- 

Based on your own experiences and observations, when students take the 
California Achievement Test: f * 

1. What percent of these .students received score which is 
a fairly accurate reflection of their^ills? - ■' 

2. What percent of these students receive a score which is 
much lower than their true capability/ 

3. What percent ofthese students receive a score which is 
much higher than their true, capability? 

■ . 'i % jmr. too 

For questions 4-9, select the statement which you believe is the WORSE 
of the pair of statements. / ' - . 

4. Which is WORSE: 

a. Accidentally placing a poor' student in an advanced group or class. 

b. Accidentally placing a good student in a remedial group or class. 

5. Which 1s WORSE: 

a. A student passing a ; test who really doesn't know the materia^ 

b. A student faiVing a test Who really does know the material. 

6. Which is WORSE: 

a. Making a student review materlaUeven though he or she knows 
the material wel 1 . , „ 

b. Advancing a. student to new material before he or she 1s ready. 

, . y 

7. Which 1s WORSE: 

a. Moving through .material too qulfckly for most of the students. 

b. Moving through material too slowly for most of tflie students. 
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8. Which is WORSE: 

a. A student just barely facing a test who probably knows the material. 

b. A student just bare.ly -passing a test who probably does not know ^ 
the material . i'4'7 , 

%J , ^ 

9. Which is WORSE: ; 

a. A student forced to re-study material in a unit'even though he 
s or she 'really understands it. 

b. A student who is confused over k the material in a unit -because 
he or she didn't master earlier units. 

For ques1*ipns 10-16, select the statement which you believe is the % M0RE 
LIKELY of the pair of statements to occur. C.A.T. means California 
Achievement "test. " , 

10. Which is MORE* LIKELY: \ x ' . -* 

a. A generally poor Itadent turns' in a very good paper, 

b. A generally good student turns in a very poor paper.' 

11/ Which is MORE LIKELY: . 

a. - A very good student receives a C.A.T. test, score which is far too ldW . 

b. A very poor student receives a C.A.T. test score which is far too high . 

12. Which:* is MORE LIKELY: 

a, A very poor student receives a C.A.T. test score which is far too low . 

b. A very good student receives a C.A.T. test score which is far too highy 

13. Which is MORE LIKELY: 

a. A generally "poor student performs very welKon a classroom test. 

b. A generally good student performs very poorly on a classroom test. ♦ 
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M. Which is MORE -LIKELY: . • S ' 

^ a. A student who should be given a failing semester grade is passed. ■ 
b. ^A student who should be given a passing semester grade is failed, 

15. Which is WORE LIKELY: 

a. A student is placed in a group or class which is too low. 

b. A student is placed in a group or class which |s too high. 

16. Which is MORE LIKELY: * 

a. A student who should fail a classroom test somehow passes. 

b. A* student who should pass a classroom test somehow fails. \ 

17. When a new student comes to your school, what type of information 
is most likely to be most accurate for making a. placement decision? 

a. The previous year's grades or marks. 

b. The previous Vear's standardized achievement test scores. 

c. The written recommendation $f the last teacher. 

d. The parents' description of the child's accomplishments. 

* - 

e. The results of an individual I.Q. test. 

18. Have you evejr taken a college or graduate course in Tests and 
Measurement? • 

a. Yes 

b. No 

19. -What is the highest current certification which you hold? 

a. A N 

b. AA "1 

c. AAA 

d. No current certification * » 
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20, Whrch best describes your school job? 

" ^ 

a. Mostly or entirely teaching duties. 

b. Mostly or ^entirely administrative duties. 

c. About equally divided between teaching and administrative duties. 



21. Would you like a summary o'f the results of this survey when it is 
complete? * 

i - •»* 

'a. Yes ' 
b. No • *. 
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FORM B: AVE RAGE STUDENT VERSION 



Name 

Last four digits of / 
social security number 



Directions 



\ 



Read each i tern careful ly 'and respond based on .your own beliefs atod 
experiences. Except for the iast four questons, there are no 
."right" or "wrong" answe'rs. ' . \ 

For items 1-3, you will have to write in your response. For items 
4-20, please circle the letter of the answer you select. 

Please attempt every n" tern and do your own work. 4 
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For questions 1-3, pi ease ' choose your answers so that the -numbers sum to 1 00' 

* ■ r " ' . ■ 

Based on your own experiences «and observations, when students take the 
Cal i fornia Achievement Test: . T 

1. What percent of these students recei ve a score which^s 

a fairly accurate reflection of their skills? / * 



2. What percent of these students receive a score, which is 
much lower than their true capability? 

3. What percent of these students receive a score which is. 
much higher than their true capability? 



For questions 4-9,. select the statement which yoif believe ifc the WdRSE 
of the pair of statements. For each question, the 1 student^ of AVERAGE 
achievement level. . - *• 4 

4. Which is WORSE: 

a: Accidentally placing student in an advanced 'group .or class. 
. b". Accidentally placing student in a remedial group or clas^ppr 

5. Which ts WORSE: ■ ' * • *. . 

a. A student passing a'test„who really doesn 1 t know the material. 
\ b. A student failing a test who really does know the material. 

6. Which 'is WQRSE: 

a. » Making a student review material even though he or she knows 

the material we] 1 . - 

b. Advancing a, student to new material before he- or she is ready. 

7. Which is WORSE: 

a. Moving through material too quickly for most of the students. 

b. Moving through' material tocTslowly for most of the students. 
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8. Which is WORSE: ' ■» % 

a. A student just bare\y failing a test who probably knows the 
material. 1 

b, A student just barely passing a test who probably does not 'know / - 
the material . v * 



9. Which is WORSE: * 

,a. A student forced' to re-study material in a u/iit even though he 
or she real lyl understands it.^* e * 

b. A student who^s confused over the material in. a unit because, 
he or she didn't master earlier units* ■ ' 

/ ' , 

For questions 10-16, select the statement whtch you believe- is theilORE 
LIKELY"Of the pair of statements to occur. C.A.T. means California- * 
Achievement Test. For each question, the student is of AVERAGE achievement 
level . "* 

10. WPTich i'S MORE LIKELY: 

a. A student turns in a ^very good paper. 
- b. A student turns in a very poor paper. 

11 . 'Which is' MORE LIKELY:. 

■ ' •* « 

a. A student receives a C^A.T. test sco,re which is far too low -. N 

b. A student receives a C.A.T. test score which is far too high . . . 

12. Which is MORE LIKELY: % 

a. A student receives a C.A.T. test score which is slightly low . 

b. A student receives a C.A.T. test score whichN^s slightly high . 

13. Which is MORE LIKELY: 

a. ,A student perform^ very. well on a classroom test. 

b. A student performs very poorly on a classroom test. 
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1-$. Which .is MORE LIKELY: . 4 . ; 

aT A student who should be given a failing semester grade 'is 
passed. , ^ ' * ■ 

b. A student who should be given a passing semester* grade is failed-./ 

' ■ I * ■■ ' " . >' 

t5. Which ii MORE LIKELY: - 

a. n A studerif is placed in'Ia'group or class which, is too low. 

b. A student is placed in a group or class which is too high. 



16. Which is MORE LIKELY: 

a. A student who should fail a classroom test somehow posies.* 
1). A student who ^"houjd pass a classroom test somehow fails. 

T7. Have you ever taken a college or graduate course in Test^ and 
Measurement? . _ - 



a. - <Yes 

b. 'No i 



IS. What is the highest current certification which you hold? 

a/ A * / 

b. AA • ■ , 

c. AAA 4 • • t 
v d v No current certification r 

19. 1 Which best describes your school job? 

v '"' 1 / 

a. .Mostly or entirely teaching duties. 

b. Mostly or entirely administrative duties. ' , / . 

c. About equally divided between teaching and administrative du^'es. 
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Appendix G 

Estimation of Pupil Performance Stimuli 
- and Sample Booklet , ' ' 
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1 Op- 



^ K^y* to Estimation of 

Pupdl Performance Stimuli 





Order 


Stimul us 


Initia-1 Valence 


Reliability 


Follow-up Valence. 


1 


-Initial 


Positive 


— r 

N/A * 


' ' N/A 


2 * 


initial 


Negative 


N/A * 


N/A 


"3/ 


Fo] low-up 


N/A x - 


. Rel i'able 


Positive 


4 


Follow-up 


N/A ■ 


Reliable 


Negative 


5/ 


Foil ow-up 


* . N/A 


Unrel iable 


«. Positive 


6 


Fpl low-up 


N/A . ' 


Unrel iable 


Negative 



.Booklet is example of the following order: s 1, 5 for a male Student. 

" • ; „ « ». . - 

v • • • 




03 



Stimul i 

Carol is ten years old and beginning the fifth grade. She lives with 
her parents, . an oldejPbrJjjker, and two younger sisters. In an .inter- 
view with her parents, hef father gave hi-s 'occupation as an engineer in 
an aerodynamics f1rnA In the interview her parents also noted that 
Carol spent about two hours each eveajpg on 'her homework and reading 
books. On an i ndiviaijal -jntel 1 i gence test, Carol*scored quite high. 

Note: AH^TFTmuli were generated for both a male student (Andrew) 
and a female student (Carol). 

Caro-1 is ten .years old and beginning the fifth- grade . She lives with 
her parents, an older brother,, "and two younger sisters. ,\naflL inter- 
view with her parents, her father gave his -occupation as a machinist * 
for an aerodynamics firm. In the interview, her parents a-Tso 'noted 
that Carol never did any homework but spent two hours each evening f 
watching television. On an individual intelligence test, Carol scored 
quite low'. . 

At mid-seme^s^tex-^drew was tested iff math' and reading. The results 
showed tha^ he was performing at about seventh grade level, approx- 
imately, two years ahead of expectations for his age. The school 
psychologist reported that Andrew's curiosity enhanced his ability 
to do well in his math and reading, and that he had* an enthusiastic 
and positive attitude toward schobf. ~> 



At miid-semester, Carol was tested in math and reading. The results 
showed that she was performing at about third grade devel , approxi- 
mately two years behind expectations for her .age. The school, 
psychologist reported that Carol had difficulty in directing her 
curiosity to school activities^ often becoming distracted and losing 
interest in class discussions, and that she had a negative attitude 
toward school . 

When interviewed, some of Carol's classmates said that they liked her 
and that they thought she was a good student. Cathy Robbins, an 
education student at a nearby college, had been hired as a substitute 
aid atiJarol's school. She had assisted in Carol's class for a few 
days and had decided to administer an inkblot test to -the class. She 
interpreted the results to mean that Carol was curious and enthusiastic 

about academic activities and that she had a positive attitude 

f " ■ 

^toward school J-* " 

When interviewed, some of Carol's classmates said that they didn't 
particularly like her and that they thought she wasn't a very good 
student. Cathy Robbins, an education student at a nearby college, 
had been hired as a substitute aid in Carol's school. She had 
assisted in Carol's class for a few days and decided to administer an 
inkblot test to the class. She interpreted the results to mean that 
Carol's curtosity led her to.be easily distracted from academic 
activities and that she had a negative attitude toward school. 



105 



Sample Booklet: Estimation of Pupil Performance 



Name 



Di rections 



Last four digits of 
social security number 



Attached is information concerning a new student. You are-to read the 
information carefully, then answer the questions 'which follow. Please 
answer all questions with the response- you believe to be bps t. 

Once you have turned a page, do not turn back. 

Begin when your instructor tells you to start. 



V 



106 



ERIC 



110 



I 



Andrew is ten years old and beginning the fifth grade. He lives with 
*his. parents , an older brother, and two younger sisters^ In an inter- 
view with his parents, his father gave his occupation as an engineer in 
,an aerodynamics firm. In the interview his parents also noted that 
Andrew spent about two hours each evening o.n hi sr homework and reading 
books. On an individual intelligence test, Andrew scored quite htgh. 



Please turn to the next page and answer Mthe questions 



\ 
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Note: For items 2-4, circle the letter of the answer you choose. 



4 



What are the chances (between 



0% and. 100%) that^.Andrewwill get 



mostly A's and B's on 



his report card? 



^(Please write in your estimate) 



2. In selecting instructional materials for Andrew in reading and 
math 'at the beginning of the semester, what kinds of texts and . 
instructional aids would you primarily use? 

a. Fifth grade level 

b. Fifth grade level and/or higher level 

c. Fifth grade level and/or lower level . 
^ - ' / 

3. Suppose that, during a math lesson, you asked Andrew a question 
and he hesitated: Would you:?v, 

a. -rephrase the same question in-order to clarify it, 

b. ask a similar question that is easier to- answer ' 

c. further explain the problem, then repeat the same* question 

d. ask the same question to another student 

e. answer the question yourself 

J ' 

4. .How "important is it for Andrew that you make a point of praising 
him every time he" does good work? 

a. very important 

b. important 

c. somewhat important , 

d. somewhat upimportant ' 

e. not important at all - 
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When interviewed, some of Andrew's classmates said that they ] iked him 
and that they thought* he was a good student. Cathy Robbi'ns, an 
education student at a nearby-college, had been hired as'a substitute 
aid at Andrew's^school . She had assisted in Andrew's class for a few 
days and had decided to administer an inkblot test to the class. .She 
interpreted the results to mean that Andrew was curious and enthusiastic 
about academic activities and that he had-a posiff^djM tude 
toward sch^oCl . . 1 

V 



Please turn to the next page and answer the questions. 



\ 
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Note: For items 2-4, circle the tester of the answer you choose. 



1. What are the chances (between 0% and 100%) that Andrew will get 
mostly A's^and B's on his report card? 

(Please write in your estimate) 



2. 



In selecting instructional materials for Andrew in reading and 
math at the beginning of the semester., what kinds of texts a/d 
[instructional aids would you primarily use? * 



you primarily 
>Fifth grade level 

Fifth grade level and/or higher level 
Fifth grade level and/or lower level 



3. Suppose that, during a math lesson, you asked Andrew a question 
and he 'hesitated. Would you: < 

a. rephrase the same question .in order to clarify it 

b. ask a similar question that is easier to answer 

c. .further explain -the problem* ; then, repeat the same question 

d. ask the same question to another student 

e. answer the question yourself 



4. How important is it for Andrew that you make a point of praising 

him every time .he does good work? 

a. very important * % 

•b. important 

c . somewhat • important ^ 

d. somewhat unimportant \ 

e. not important at all 



•* ... ^ : y 

0 

no'.,. 



Key to Knowledge and Perception Subscales 





Subscale 


Number of items 


> 

Where found 


Notes 


Validity 8 


8 


Appendix A 





Ea-ch of these items pre- 
sented, nohtest and test 
information. Responses 
were coded as: 

Valid = 3 

Questionable = 2 

Inval-id = 1 
The combinations of infor- 
mation types, in order., were 
LO-HI, RI-LO, KI-LO, HI-HI, 
LO-LO, LO-HI, -HI-LO, fll-HI- 
Where the first entry H is the 
nontest information and the 
second is the test score. 

Possible score range was, 
therefore, from 8-24, where, 
high score represents belief 
in validity of test score 
(in conjunction with givejr 
nontest score). . 



Knowledge 



10 



?. 1-10 



Each (Ttehrsf/as^ simply scored 
>as rihht or wrong. The 
keyedf responses were, in 
ordef : E, D, B, D, E, B, 
B, D/, C, D - 
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Key to Knowledge and Perception Subscales (continued) 



4 


\ 






Subscaje 


Number of items 


Where found 


Notes 


Test-wiseness 




II. 1-14 






Item Key or Guide 


J 





1 

2 
3 
4 
5 
6 

- 7 
8 
9 
10 
1L 
12 
13 
14 



Any response ("guess") 
C (Absurd alternatives, plus clue or "ologist") 
B (-Stem asks for one meaning) 
(Answer is given by item 8) 
A ("Smallest" number is the clue) 
A (Answers B, C, D mean the same thing) 
C (Correct response is very different in length) 
B (Grammatical clue: "a") . * 

B (Gorrect response is yery different in leri&th) 
C (Resemblance of stem and correct alternat^jw^ 
B (Stem asks^for two outcomes, only 'B' give'Ptwd) 
D (4, B, C mean the same thing) 
B (Answer is given awa> by choices in item 1) 
D {A, B, C mean the Same thing) • « ' 



Preference 
• t. M 



5/9 



* III. A 
III. B 



The items used in this section were given weights %o 
reflect the relative degree of dependence upon test 
score, as opposed to nontest score, information. 
The rating scale weights were as follows: 

A. If the nontest information was used for the 
decision: 

1 » Incomplete nontest data, qomplete W^t 

data presented 

2 = Incomplete nontest data and incomplete 

test data presented, 

3 * Complete nontest data'and complete test 

data presented 

4 = Complete nontest data and incomplete 
, * test data presented 
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Key to Knowledge and Perception Subscales (continued) 





Subscale 


Number of items 


Where found . 


No.tes 




B. If the test 
decision: 


information was used 


for the 



5 = Completcvtest data and incomplete non- 

test darea presented ™ 

6 = Complete fee\t clata and complete nontest 

data pri 

7 = Incotnplete^ijlsi; data and incomplete non- 

test data presented 

8 - Incomplete test -data and complete non- 

test data presented 

The racing* scale,, from l-8,j represents -increasingly 
higher decrees of dependence upon test data .as the 
value goes up. Lower ratings represent a lower 
degree of dependence upon ties.t data. The overall 
preference score was calculated as a percentage of 
the maximum possibJfi score ;f or each form of the 
instrument. High percentages would indicate strong 
deperidence upon the test score data. Only those 
items for which confl ictingiinformation was presented 
were scored. i f 



Part 



Form 



Item Response Ratings Part Forijr Item Response "Ratings 



V 






















II A 23 


A 


■ -7, 


B 


= 2 


II 


B 23 


A 




6, B 


= '3' 


24 


A 


= 2, 


B 


= 7 




< 24 " 


A 




5, B 


1 


28 


A 


- 1.' 


B 


= 5 




26 


A 




8, B 


= 4 


' 29 


A 


- 1, 


B 


= 5 




. 27 


A 




4, B 


= 8 


33 


A 


■ 8, 


B 


= 4 




V 29 


A 




8, B 


= 4 












K 


■ 31 


A 




6, B 


= 3 














"H 32 


A 




6, B 


= 3 














33 


A 




3, B 


« 6 














S- 34 


A 




4}B 


= 8 
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I. Knowledge 



Use the following data to answer items 1-2. 

Student PRETEST ITEM POSTTEST ITEM 





1 


-75 — 

L 


3 


4 


5> , 


1 


2 


3 


4 


.5 


John* 


+ 


+ 


0 


0 


*o 


+ 


V 


+ 


0. 


+ 


Ann 


0' 


+ 


.+ 


0 


0 


+ 


0 


+ 


0 i 


0 


"Susan 


' + 


+ 


+ 


0 


0 


+ 


+ 


+ 




'+ 


Bill 


0 


0 


0 




0 


+ 


+, 


+ \ 


0 


+ 


Pete 


+ 


0 


0 


+ 


0 


+ 


+ 


+ 




+ 



+ - correct response; 0 = incorrect response 
Which item shows greatest sensitivity to instruction? 



a. 
b. 
c* 
d. 
e. 



If each item represents a different skill, what skill was learned 
(or taught) least well? 



a. 
b. 
c. 
d. 
e. 



A particular test item has a difficulty index of .36. Teacher A says 
this means that 36% of the examinees missed the item. Teacher B says 
this itenwwas a hard one for the examinees. Who is correct? 

a. A onl^ 

b. B only » * - 

c. both A and, B 

d. neither A nor B 
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t 4. A student receives a percentile rank of 7.4 on a social studies 

achievement test. Teacher A says this means that 74%*of the norms 
group did as well or better than this student. Teacher B says the 
Student- got 74% of the items correct. w(io is correct?. 

\. A only 

b. B only ,[ 

c. „both A and B 

d. neither A nor B 

5. On the CTBS (California Tests of Basic Skilly), John obtains a raw 
score of 54 in mathematics concepts. On the CAT (California Achieve- 

• ment Test), Bill obtains a raw score of 44 in mathematics concepts. 
One appropriate conclusion is: 

a. John is more proficient in mathematics concepts than Bill. 

b. John and Bill a*re equally proficient in mathematics concepts. 

c. Johjp's true score in mathematics concepts .is higher than Bill's. 

d. John answered a larger proportion of items Correctly than did Bill. 

e. No comparison should be made between these two scores^ 

6. A'child performs at the 37th percentile on a nationally normed 
achievement' test.- I#<fce child's ranking had been incorrectly 
determined by referring to a norms table for schools , the resulting 
percentile*rank would be: { r* 

a. higher. , . 

b. lower i 

c. unchanged 

7. On an achievement test, two fourth grade students, Peter and Jane", 
'received grade-equivalent scores of 4.4 and respectively.* 

Teacher A ,says Jane did as well on the test as the average' -eigth-grade. 
students. Teacher B sayl Peter answered fewer items correctly than ^ 
Jane. Who is correct?^ 



* 4 a. A only 

-\ b. B only 



c. both A and B 

d. neither A nor B* 
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A student deceived a grade-equivalent score of 10.2. ThiVscore IL 
indicates that: " t 

a. v He ranks his class at*the equivalent of a rank of 10.2 for the 

grade 10 students of the normative group. s 

b. He should be placed in the tenth grade in instruction in this 
subject. 

c. HisCraw^pore is the samf as the-median score earned by all 
students in the norm group Who- were 10. "2 years' old at the time 
of testing. 

d. His raw score crn this tjjjt is the, same as the approximate median 
of scores made by pupils in the second month of the tenth grade.. 

Which of the following indicates" the BETTER performance on a normed ' 
test? 

a. A percentile rank of 65 

b. An NCE score of 40 . 4 

c. A T-rScore of 60 ■ ■ 

d. There is no way to distinguish among the scores. ; 

Which of the following indicates the POORE^ performance on a normed 
test? - .. . . 

a. A 68% confidence band in percentiles of 38-54 

b. An NCE score of 45 ■ 

c. A 95% confidence band in percentiles of 30-62 

d. There is no wayjo distinguish among the scores. ■ ■ 

II. Test-Wiseness 




For items 1-14, choose tfie best answer. Each item except one suffers 
from a common item construction flaw. 

One resistor of 30 ohms is wired in parallel'inn'th a resistor of 
60 ohms. What is the total resistance? 

a. 20 ohms 

b. " 45 ohms ' 

c. 60 ohms 

d. 90 ohms 



An ornithologist is a person who % ' 
' f 

a. sells shoes^ 

b. drives a fc*xi cab. 

c. studies birds. 

d. plays a viol in. 



* J.-. What is om meaning of the word panache? ^ 

a. * 6rmo1u or frantic • * 

b. a bunch of feathers on a helmet . ■■ ' j-:: 
* : • c. pandemonium or hoopla • 

d. helve, fractious, and chanteuse * % 

4. Which of the foil owing* means "How are you? M 

a. Maintenant, aujourd'hui? ri • 

* CommentraJTez vous? * ";; r y 

^A* • . c^ Ne'est-ce*pas? ' , >v ' 

^ - * d. Tres. bien,^et vous? • ' , 

• . 1 ■ v ■/... , • / \ ^ ■ f 

... . 5. If you had\one hour to answer fifty fj/u) multiple-choice questions, what 
is the smallest number^you should have answered in a half-hour? ''X 

* 10 > . '* A 

b. 25 , . . 

c. 30 " . " - « 

d. 45 * ' * 
. .7 ' e. 50. | 

6. When Bestor crystals are added to- water,. 

*• » - * 

a. the water turns blue. 

b. the temperature rises. * . 

c. heat is given off. 

, d. a thermometer will read higher. * 

7. How have scientists recognized the great work of Linnaeus^ 

i * 

a. By giving him the Nobel prize. - 
, b. By founding a college w.ith his name. ■ » 

c. By adding the . letter L. to the names of all the animals he had 1 
classified. 

d. By awarding him a cash prize. 



8. "Comment-all ez vous?" wfiHh means "How are you?" is a: * 



a. old English saying- 

b. French expression. 

c. Italian phrase. 

d. Arabic question. 
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9. To change a .verb like coqk to a gerund, you could 

a. double the consonant and add the letters ^-ies. 

b. add -ing. •' 

c. change the verb to a predicate adjective, such as pressure cooker. 

d. capitalize the first letter, and add -ed to the word or sentence. 

10. Another word for.-convi vial is « 

a. voracious . 

b. inextricable. 

c. jovial . * 

d. placebo. / . 

11. In the southern United States, two outcomes of the Civil War were ^ 

a. slavery flourished in most states. 

b. reconstruction and the aoolition of slavery. . - 

c. more wars in mainland China during 1871-1880. 

d. fewer plantations in Alabama. 

1 . ^ 

12. if something is flammable, it will 

a. resist burning. " 

b. not catch on fire. 

c. not be consumed by flames. 
,d. easily ignite. 

13. If a resistor of 60 ohnkis wired in parallel with a resistor :of 
30 ohms, the total resistance is 75 ohms (60 + h x 30). 

a . True 

b. False 



14. • An example of an opening in a room is 

a. a window. 

b. an egress. 

c. a doorway. ' * 

d. all of the* above. 
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III. A. Preference 



For items 21*35, you are to read each record card for a^ student. All ' 
scores given are percentile ranks. The interest area scores are from the 
Kuder Preference Record , Vocational Form C. 

The record also gives- evaluations of the student by the adviser. The 
'advisers all have considerable teaching experience as well as training in 
educational and vocational guidance. 

The names are fictitious, but otherwise the records are' accurate. All data 
on the records were obtained during the tenth grade year. 

You are to decide whether the student should be placed in the regular or 
accelerated science class for grade 11. In the accelerated class, students 
are expected to learn at a faster rate and more intensively than in the 
regular class. 

You should examine the information for each student, then decide for which 
class you will recommend the'student. Mark that choice on. your answer, 
sheet. 

There is no limit on the number of students you place in «ither^ science 
class. ' 



YEARLY RECORD fOA GftAC 


)E 10 






NAME Gregory Barton 




AGE: 16 
















Intel 1 igence 




Achievement Percentile 


Kuder Interest 


Percentile 


Test 


IQ 


Test Rank • 


Area 


Rank 






Reading 65 


Mechanical 


75 






Scienter* 89 


Computational 


87 






Math 88 


Scientific 


83 






Social Studies 64 


Persuasive 


64 








Artistic 


50 








Literary 


43 








Musical 


36 








Social Service 


28 








Clerical 


19 


HOME-ROOM TEACHER: An excellent student high in achievement and ability. 


ADVISER: Well-liked. 


Capable. Conscientious. Excellent student. 





a. Accelerated science / 

b. Regular science 

« ' « 
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22. I YEARLV RECORD FOR* fiftAOE 10 



NAME 61 en Chapman 



AGE: 16 



Intel"! igence 

Test IQ 

Cal i form* a Test of 
Mental Maturity: 109 



Achievement 
Test- 



Percentile 
Rank 



Kucier Interest Percentile 
Area Rank . 



Reading 
Science 1 
Math 

Social Studies 



26 
25 
26 
24 



Mechanical 

Computational 

Scientific 

Persuasive 

Artistic 

Literary, 

Musical 

Social Service 
Clerical- 



^23 
21 
18 
41 
63 
61 
48 
83 
87 



HOME-ROOM TEACHER: Glen has his heart set on becoming a scientist like his 
father. Unfortunately his ability does not seem to warrant this. He 
accompanies his father to the lab evenings and weekends and loves every 
minute of it. He works very hard but does not seem to understand basic 
scientific concepts. ^ ' ■ 

ADVISER: Glen is keenly interested in all things scientific. All three science 
teachers have commented to me on his- Werest"but they are worried th^t his 
ability is just not up to his ambitions. ___ 



a\ Accelerated science 
b. • Regular science 



YEARLY RECORD FOR GRADE 10 , 
NAME Doris Shechan AGE: 16 


Intel 1 igence 

Test IQ 


Achievement Percentile 
Test Rank 


Kuder Interest Percentile 
Area Rank 


OTIS: 124 


Reading 84 
Science 82 
Math 81 
Social Studies 84 

U : 




HOME -ROOM TEACHER: This girl has no interest in anything but athletics. She 
spends all of her time in the gym. Her English teacher tells me she writes 
nearly all of her papers on games and sports. m - 


ADVISER: interested only in sports. I have talked with her about becoming a 
physical education teacher but she says she wants to "play, 11 not "teach." 



a. Accelerated science 

b. Regular science 
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YEARLY RECORD FOR. GRADE 10 

NAME John Dfiwltt AGE: 16 


Intel 1 igence 

Test IQ 


^Achievement Percentile 
Test Rank' 


Kuder Interest Percentile 
Ares Rank 


California Test of % 
Mental Maturity: 106 


Reading 39 
Science 26 
Math 27 
Social Studies 44 


j 

■ i 

i 


HOME-ROOM TEACHER: John cares only for science. He is- never happier than when he 
is "experimenting" in the little laboratory he built in his basement at home. 


ADVISER: Very interested in science. He told me that his chief^problem was to 
decide which field of science to go into. 



a. Accelerated science 

b. Regular science 



YEARLY RECORD FOR GRADE 10 

NAME Mary Mullen AGE: 16 


j " 




Intel 1 igence 

Test IQ 


Achievement Percentile 
Test Rank 


Kuder ilnterest Percentile 
Area "Rank 


OTIS; 129 

j' 




MechaD.ical 26 
Computational ' 29 
Scientific 32 
Persuasive 43 
Artistic ( 76 
Literary 54 
Musical .40 
Social Service 65 
Clerical 94 


HOME-ROOM TEACHER: Every teacher who has this girl complains about her. She is 
near the bottom in all her. classes; her work is rarely handed in on time; she 
practically refuses to recite or to answer when called on. 


ADVISER: -I am concerned about Mary. She has no interest, no plans, no ambitions. 
She dislikes school intensely and refuses to work at anything. A very difficult 
girl. 



a. Accelerated science 

b. Regular science 
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YEARLY RECORD FOR GRADE 10 

NAME Elaine Humphrey AGE: 16 ' 


Intel 1 igence 

> test IQ 


Achievement Percentile 
Test Rank 


Kuder Interest Percentile 
Area Rank 


California Test of 
Mental Maturity: 120 


Reading 81 
Science 82 
Math ' 82 
Social Studies * 84 


Mechanical 85 
Computational . 87 
Scientific 85 
Persuasive . 16 
Artistic 49 
Literary 37 
Musical 43 
Social Service 21 
Clerical 31 


HOME-ROOM TEACHER: . 


ADVISER: ^ ' fc S^ 


a. Accelerated science 

b. Regular science 


YEARLY RECORD FOR GRADE 10 

NAME Margaret Hilton AGE: 16 






Intelligence 

Test * 10 


Achievement Percentile 
Test Rank 


Kuder Interest Percentile. 
Area. Rank 


Cal ifornia Test of 
M^Ttal Maturity: 103 




Mechanical 82 
Computational 81 ; 
Scientific 79 
Persuasive 58 * 
Artistic . 42 
Literary 46 
Musical 48 
Social Service 60 
Clerical 22 


HOME-ROOM TEACHER: Excellent student. The math teacher tells me that he has yet 
to call on Margaret for an explanation that she cannot provide. 


ADVISER: A born mathematician* Bright and capable girl, Will do well In any 
type of scientific research. 



a . Accel erated science 

b. Regular 'science 
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YEARLY RECORD FOR GRADE 10 

NAME Margaret Nielson AGE: 16 ■ ' -' 


Intel! igence 

Test IQ 


Achievement Percentile 
Test J Rank 


Kuder Interest Percentile 
Area Rank 


Cal i form' a Test of , 
Mental Maturity: 109 


Reading 23 
Science 26 
hath 24 
Social Studies 26 


Mechanical 21 
Computational 17 
Scientific 26 
Persuasive 40 
Artistic 37 
Literary 53 
Musical 63 
Social Service 25 
Clerical 87 


HOME-ROOM TEACHER: Margaret is a capable and industrious student:. . She does good 
work in all her classes and is very popular with both her teachers a-nd her peers. 


ADVISER: This girl has yet to make a firm decision regarding her future. Her 
chief interest lies in working in a hospital, but she does not want to become 
a nurse. I have discussed the possi bi 1 i ties of her becoming 1 a laboratory 
technician, an X-ray technician, or doing medical research. Of these she 
prefers the last. Her interest and capability in science would make this 
a good choice for her. 


a. Accelerated science - 

b. Regular science 

% 


YEARLY RECORD FOR GRADE 10 

NAME Mildred Learch . AGE: 16 






Intel! igence 

Test IQ 


Achievement Percentile 
Test Rank 


Ku3er Interest Percentile 
Area Rank 


Cal ifornia Test of 
Mental Maturity: 106 

1 ■ 


Reading . 23 .. 
Science 25 
Math 23 
Social Studies 26 


Mechanical 67 
Computational 81 
Scientific 93 
Persuasive c 63 
Artistic * 39 
Literary 41 
Musical 16 
Social Service 32 
Clerical 19 



HOME-ROOM TEACHER: A superior student. Does excellent work in all of her 
classes. . ■ * , ?i 



* ADVISER: One of our better students. No definite plans other than "college" 
as yet. ■ , ; 

a. Accelerated science 

b. Regular science . . 



124 



128 



y 



30. I VEAftLY ft£C0ft0 FOft 10 



NAME Ruth Skillnian 



AGE: 16 



Intel 1 igence 




Achievement 


Percentile 


Kuder Irterest 


Percentile 


Test 


• iq 


Test 


Rank * 


Area 


Rank 


Cal ifornia Test of 




Reading 


73 


Mechanical 


68 


Mental Maturity: 


106 


science 


85 


Computational 


81 . 






Ma^Tr*" 


88 


Scientific 


84 






Social Studies 


76 


Persuasive 


48 . 


»» 








Artistic 


53 










Literary 


41 %■ 










Musical . 


37 










Social Service 


47 










Clerical 


55 


HOME-ROOM TEACHER: 


This gi 


rTs ability is quite high. 


On two different ''occasions, 



teachers have told me that when cla^s discussion gets involved she can ask a 
question that cuts righjt to the heart of the matter', , , .. % 
ADVISES: This girl want& to become a high-school teacher and I have encouraged 
her in this.. She is of superior ability and I bel ieve she will be quite 
successful in working wjth students. • 



a. 
b. 



Accelerated, science 
Regular science 



31. 



YEARLY RECORD FOR GRADE 10 , . . 

NAME Morton Dawson „ AGE: 16 
A . 


Intel 1 igence 

Test 10 


Achievement Percentile 
Test Rank 


Kuder Interest Percentile 
Area- Rank 






Mechanical 17 
Computational 28 
Scientific 31 
Persuasive 62 
Artistic 24 . 
Literary 23 
Musical 19 
Social Service 48 . 
Clerical. 71 


HOMt-RUUM TEACHER: Poor student. Limited ability. • 


ADVISER: Plans to become a chemist like his father and brother but his low 
ability and achievement make this possibility unlikely. 



a. Accelerated science 
hi Regular science 



125 

129 ' 



32. 



YEARLY RECORD FOR GRADE 10 

NAME Catherine Kenny AQE;. . 16 4 


Intel 1 igence 

Test IQ 


Achievement^ Percentile 
-Test » Rank 


Kuder Interest Percentile 
Area * Rank 




Reading 26 
Science 26 
Math 24 
Social Studies 23^j 


frlechanicaj 23 
Computational 21 
Scientific 18 
Persuasive 49 
Artistic 53 
Literary 57 
Musical. 36 ^ 
Social Service 72 
Clerical „ r 89 


HOMF-RQQM TEACHER: Catherine is a very conscientious student who gets^Hong 
well with everyone. Although she .works very hard and gets good marks she 
does not always seem to "grasp" the essentials. 


ADVISER: Is seriously considering becoming a high-school science teacher. 



a. Accelerated science 

b. Regular science 



33, 



YEARLY RECORD fOft 6RA0£ 10 




NAME Martin Anderson 


.AGE: 16 










Intelligence 

Test * IQ 


Achievement Percentile 
Test * Rank 


Kuder Interest Percentile 
Area . Rank 


California Test, of 
Mental Maturity; 121 






HOME-ROOM TEACHER: This boy is near the bottom of his class in achievement. . 
Many teachers have commented to me about his poor work. 


ADVISER: Poor worker. Very low in achievement. Interested only in athletics. 
Talks of being a professional athlete. . 


a. . Accelerated science 

b. Regular science 




. * ■ 



1 



V 



126 



130 



34. 



YEARLY RECORD FOR GRADE 10 ^ 
NAME Burt Ingram ' * AGE: 16 


Intel 1 igence 

Test IQ 


Achievement Percentile 
Test Rank 


Kuder Interest Percentile 
Area . Rank 






i . •. 

* 


HOME-ROOM TEACHER-: Inferior ability and achievement, 
most of his clashes. 


Does failing work in 


ADVISER: No interest in school or any of his classes. Spends most of his time 
fwith his gang hanging around street corners. Below average in ability and 
achievement. 


'a. Accelerated science * 
b. Regular science *. / 


YEARLY R.EC0RD FOR GRADE 10 

" . K ' (■ 

NAME " Bill Turner' ■ AGE: 16. " 






Intelligence 

Test IQ 


Ach i evement Percent i 1 e 
Test " Rank 


Kuder Interest Percentile 
Area Rank 


Cal ifornia Test of* 
Mental Maturity: 123 


Reading 6.4 
Science . .44 
Math ^ / . 41 
Social' Studies 72 


Mechanical 3& 
Computational 50 
Scientific . * 41 
Persuasive 63 
Artistic 81 
Literary 78g^ 
' Musical 8Z 
Social Service 46 
Clerical 79 



35. 



frequently doubt that the work he hands in is his own. He rarely recites in 
class or enters into the discussion, and when called on he .seems not to 

understand the question. . ... 

ADVISER: Bill's parents have talked with me about whether to send him to college, 
but I doubt that h^Jias the abifity. Various comments about his behavior in 
acne 



class from his teacners tend to support my judgment in this. 



a. Accelerated science 

b. Regular science 



ERLC 



127 . 



13 



UK B. Preference 



For items 21-35, you are to read each record card for a student. All 
scores given are percentile ranks. The interest arwscores are from 
tne Kjder Preference Record, Vocational Form C t . 



The advisers 



The record also gives evaluations of student by the adviser. a ^ ftria1 
ill have considerable teaching experience as well as training in educational 
and vocational guidance. * 

The names are fictitious, but otherwise the records art accurate. All 
data on the records were obtained during the tenth grade year. 

You are to decide whether the student should be placed in the regular 
or accelerated science class for grade 1,1. In the accelerated class, 
stu dents, are ex pected to learn at a faster rate^ and more intensively 
than in the regular class. 

You should examine the ig£oj 
class you will recommenc 
sheet. 



ation for each student, then decide for which 
.student. M*rk that choice on your answer 
y 



There is no limit on the Tiumbei 
class.. 



.of students you place in either science 



21 rYEARLY RECORD FOR GRAUt JU_ 

! 

! NAME Gregory Barton 



AGE.: 



16 



: Intelligence 
I Test- 



JO. 



Achievement 
Test 



Percentile 
Rank 



Reading 
Science 
Math 

Social Studies 



65 
89 
88 
64 



Kuder Interest Percentl le 

Area Rank 
Mechanical 



Computational 

Scientific 

Persuasive 

Artistic 

Literary 

Musical 

Social Service 
Clerical 



HOME-ROOM TEACHER* An excell ent student high in achiexement and abUity. 



ADVISER: Well-liked: Ca pable. Conscientious. Excellent student 



a. Accelerated science 

b. Regular science 



128 



75 
87 
83 
64 
50 
43 
36 
28 
19 



4> ; 



ERLC 



32 



22. 



1 »■ *J 



YEARLY RECORD FOR GRADE fiT 



NAME Glen Chapman 



AGE: •, 16 



Intelligence 
Test 



IQ 



Achievement 
Test 



Percentile 
Rank 



Kuder Interest 



Area 



Percentile 
Rank 



California Test of 
Mental Maturity 109 



HOME-ROOM TEACHER: ; Glen 
father. "Unfortunately 
accompanies his father 
- minute of it. He works 
scientific concepts 



Reading 26 

Science , ,25 

Math 26 

Social Studies 24 



v 



Mecnanica i o 23, 
Computational 
Scientific 
Persuasive 
Artistic 
Literary 
Musical / - 4 
Social Service 
Clerical „ 
scientist like his 



has his heart set on becoming a 
his* abi 1 i ty does not seem to warrant this. He 
to the lab evenings and weekends and loves every 
very hard but does not seem to understand basic 





H8 




^41 ■ 




\ 63 




' 61 


48 


83 


87 



ADVISER: Glen is keenly* 
teachers have commented 
ability is just not up 



interested in all things scientific.. ;JM1 three science 
to me on his interest but they are worried that his 
to his ambitions. ^ ; : 



a. Accelerated science 

b. Regular science 



23. 



YEARLY RE*RD FOR GRADE HT 



NAME Paul Kilgore 



AGE: 



Intelligence 

Test. IQ 


Achi evement Percenti 1 e 
Test Rank 


Kuder Interest. Percentile 
Area , R'a*nk 


OTIS: 121 
/ 


Reading 81 
Sc*J -e 83 
Math 84 
Social Studies 82 

* 


Mechanical 85 
Computational 87 
Scientific . 85 
Persuasive 41 
Artistic 16 
Literary 22 
Musical 19 
Social Service 38 
Clerical 21 


HOME-ROOM TEACHER: I have 
lackadaisical attitude a 
and achievement are both 


heard twg different teachers comment on Paurs 
nd class work, and I agree with them. His ability 
below average and his interest in his studies. is nil. 


ADVISER:- Paul is a difficult b©y to talk to. When I try to get at the reason 
for his poor school work and total Tack of interest he clams up and I get no 
where. His lack of ability is as apparent to all of his teachers^as it is 

to me. - ' : . — . — : 


a. Accelerated science 

b. Regular science . » 

129 >^ 



ERIC 



33 



"YEARLY RECORD FOR GRADE 
NAME Keith Warren 



AGE: 



16' 



Intelligence 
Test" 



10 ! 



Achievement 
Test 



OTIS: 



123 



hOmE^rOOm TWCylk: 



Reading 
Science 
Math 

Social Studies 



Percentile 

Rank 

^84 . 
81 
81 
83 



Kuder Interest Percentile 



Area 


Rank 


Mecnanica) 


33 


Computational 


45 


Scientific 


37 


Persuasive 


, 81 


Artistic 


69 


Literary 


67 


Musical 


i 


Social Service 




Clerical 


49 



into serious conflict with two of his teachers. His achievement 

and in ability he is near the bottom of his Class, : 

ADVISER: ~~~ — ~ _ 



* He has come 
is very low,* 



a. Accelerated science 

b. Regular science » 



I YEARLY RECORD FOR GRADE JJJ 



j NAME Kathv Parker 



Intelligence 

Test - " IQ 


Achievement Percent! le 
•Test Rartk 


Kuder Interest Percentile 
Area. . y Rank 




J 


Mechanical '81 
Computational 79 
Scientific 84 
Persuasive 31 
Artistic * ot 
Literary - 79 
Musical 42 
Social Service .37 
Clerical 1 61 


HOME-ROOM TEACHER: An exc 
hut ic P«;npcianv intere 


el lent student. Stands high in all Ot her classes, 

sted in English and literature. ; 


"RbVfSER: Hans to become a writer. Superior in aonity and achievement, i 

have discussed colleges and college courses with her in detail. . 1 



a. Accelerated science 

b. Regular science 



130 



"YEARLY -RECORD FOR GRAD^N 
NAME Ruth Chanqer 


10 

I AGE: 16 » 


Intelligence r~d 
Test' Id ; 


' Achievement Percentile 
Test Rank * 1 


Kuder Interest Percentile 
Area Rank 


s 

i 

a. 


^ Reading 65 
. Science 83 
Math 80 
Social Studies 63 ■ - 


Mechanical 74 
Computational ' 82 
Scientific 86 
Persuasive v 31 
Artistic \ 16 
Literary A 25 
Musical \ 33. 
Social Service 45 
Clerical 59 


\H0ME-R00M TEACHER: A bright girl -but is below average in acnievemenL. nunc 

1 int^pqted in her duties as cheer-leader than in__her school work. _ l a _ 


iXbVISER: A pleasant and popular girl. Does not work up to ner tuI 1 capability, 

I Plans to become a beautician and work in her sister's beauty parlor^_ ; — : 



a. Accelerated science 

b. Regular science • 



' YEARLY RECORD FOR- GRADE 10 
NAME Joyce Durwith A^f: 16 


Intelligence 

Test IQ 


Achi evement • Percenti 1 e 
•Test Rank 


Kuder Interest Percentile 
Area Rank 


California Test ot 
Mental Maturity: 109 

* . 


C * • 

4 




HOME-ROOM TEACHtR: A very 


capable girl. Does well in all ,ot ner ciasseb. 


ADVISER: Very good student. Ha*jB talked with her about going on w college. 

She plans to study" nucl ear physiCs.- _ — , ■ ■ — " 



a. Accelerated science y . 

b. Regular science* • ' * ' I 



131 



4* 



28. 



YEARLY RECORD FOR GRADE 10 
NAME Alex Crane 



) 

AGE: 



16 



Intelligence 
Test 



10 



Achievement \ Percentile j Kuder Interest Percentile 
Test Rank i .. Area ■ Rank 



HOME-ROOM TEACHER: A top-notch student. Several teachers h^ve commented to me 
about what a pleasure it is to have Alex in their classes, His work is always 
well done and always in on time. He seems interested in everything. 



AnvTSFB- This boy's only problem is in deciding what most interests him. He 
enjovs all of . his classes and does very good work in all of them. To 'date 
he has^considered Law, Medicine, Politics, and Teaching! ]_ 



a. Accelerated science 

b. Regular science 



29. 



YEARLY RECORD FOR GRADE 10 



NAME Frances Delong 



AGE: 16 



Intelligence 
Test 



IS. 



OTIS: 



129 



Achievement 
Test 



Percentile 
Rank 



Reading 
Science 
Math 

Soc'ial Studies 



84 
82 
81 ^ 
81 



Kuder Interest Percentile 
Area • Rank 



HOME-ROOM TEACHER: This girl is a problem! Her work is very poor, her ability 
is definitely below average, and her attitude toward school and her teachers 
worse than both. Every teacher complains of her poor attitude .and lack of 
interest. 



ADVISER: . If this girl has any interestHi canngt locate-them. I have* talked 
with her several, times, but no success . ^tier lack of ability and. 
achievement are all part ott the same picture. , 

a. Accelerated science 

b. Regular science f 



,132 



136 



30. 



YEARLY RECORD FOR GRADE 10 
NAME ' • Darrel 1, O'Rourke 



AGE: 



16 



Intelligence 1 Achievement .Percentile j Kuder Interest Percentile 
Tpst 10 1 Test Rank ! Area . Rank 




Reading 28 
Science 17 
Math 14 _ 
Social Studies 26 


Mechanical 42 
Computational 39 " 
Scientific 43 
. Persuasive 84 
Artistic *68 
Literary 42 
Musical 27 
Social Service 65 N 
Clerical 79 


HOME-ROOM TEACHER: Below average student* quite- limited 
achievement. Careless afbout his work.? Dislike* school 


in ability "and 


ADVISER: Ability and achievement are both limited, 



a; Accelerated science 
b. Regular science 



r 



31. 



YEARLY RECORD FOR GRADE TIT 



NAME Bernice Eager 



AGE: 



16 



Intelligence 
Test 



Kuhlmann-Anderson 



Achievement 
Test 



Percentile 
Rank 



Reading 84 

Science 84 

Math 84 

Social Studies 81 



HOME-ROOM TBACHER: Bernite is extremely bright. She lov 
economics arid dreams of the day when she will have her 
She has no interest in anything except home-planning an 

ADVISER: This girl's strong interest in home economics a 
ability has led me to suggest that she enter this field 
She will have none of it. She has no interest in anyth 
becoming a wife and mother. 



Kuder Interest Percentile 
Area Rank 



Mechanical ■ 87 

Computational 85 

Scientific 93 

Persuasive 40 

Artistic - 27 

Literary 36 

Musical -31 

Social Service 43 

Clerical 22 



es her work in home 
own home and family, 
id home -management. 



nd her very high 
professionally, 
ing other than 



133 



ERLC 



YEARLY RECORD FOR GRADE 10 

f ° 

NAME Carroll Scott AGE: 16 


Intelligence 

Test * 10 


Achievement Percentile 
Test Rank 


Kuder Interest Percentile 
Area Rank 


OtlS: 120 


Reading 81 
. Science .84 
Math 83 
Social Studies 81 
* 


Mecnanicai 87 
Computational 86 
Scientific 93 
Persuasive 40 
Artistic 18 
Literary 26 
Musical 38 
Social Service 54 
Clerical 19 


HOME-R0OM TEACHER: Below 
on time. The only teach 
©duration teacher. She 


average in achievement, work is sloppy and never t 

er who has not commented on this is the physical 

alwavs aets A's in physical education^ — 


"ADVfSER- This qirl's low achievement will prevent her from being successful in 
college. She is planning to attend college, and I ^ several times 
. warned her that unless-her achievement improves she will have difficulty 
in qaininq admittance. She<Dlans to become a physical-education teacher. : _i 



a. Accelerated science 

b. Regular science 



YEARLY RECORD FOR GRAUt _iU_ 
NAME Michael Vau^han 



AGE:- 16 



Intelligence 
Test 



_1Q_ 



Achievement 
Test 



Percentile 
Rank 



Kuder Interest Percentile 
Area '* Rank _ 



OTIS: 



Reading 
Science 
Math 

Social Studies 



23 
24 
26 
24 



Mechanical 21 

Computational 18 • 

Scientific 24 

Persuasive 36 

Artistic 41 

Literary 32 

Musical 58 

Social Service 85 

CI ericaV 79, 



"HOME-ROOM TEACHER: A very 



hard-working student. Gets good grades 



ADVISER: Mike plans to b ecome a nigh-school, science teacher and I have 
encouraged him in this. I talked with his chemistry teacher who toldme 
of the excellent work Mike did on his science projects. It seems as 
though he spent more time and <jid a more thorough job than anyone else 
in the class. ' ; — : — 



a. Accelerated science 

b. Regular Science 



134 



( 



128 



r 



YEARLY RECORD FOR GRADE J0_ 

NAME Robert Elliott* AGE: 16 - 


Intelligence 

Test 10' 


Achievement Percentile 
Test Rank 


Kuder Interest Percentile 
Area Rank 


California lest of 
Mental Maturity: 108 

■ \ 


Reaaing 24 
Science 26 
Math 26 
Social Studies 23 




HOME-hOOM TEACHER: Robert 
work in all of his class* 


is a capable and hard-working student. He does good 
»s. His ability is well above averaqe. 


ADVISER: Plans to become a chemist or a physician. Does excellent work in his 
science classes. 



a. Accelerated science 

b. Regular science 



Y£AftLV record FOR GRADE 10 

NAME Norman Richardson AGE: 16 


Intelligence 

Test 10 


Achievement Percentile 
Test Rank - 


Kuder Interest Percentile 
Area Rank 


California Test of 
Mental Maturity 108 


Reading 23 
Science 26 
Math 26 
Social Studies 24 


Mechanical 21 
Computational 24 
Scientific 24 
Persuasive 62 
Artistic . 37 
Literary . 39 
Musical 51 
Social Service 78 
Clerical 61 


HOME-ROOM TEACHER: 


ADVISER: « 



a. Accelerated science 

b. Regular science 



123 



